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“This book equips the student with a substantial 
background in the basic operations of 
. computation, approximation, interpolation, 
numerical differentiation and integration...” 


About The Book... 


Like the first edition, this edition provides 

a mathematical introduction to the funda- 
mental processes of numerical analysis. 
The book equips the student with a sub- 
stantial background in the basic operations 
of computation, approximation, interpola- 
tion, numerical differentiation and integra- 
tion, and the numerical solution of 
equations, as well as in applications to such 
processes as the smoothing of data, the 
numerical summation of series, and the 
numerical solution of ordinary differen- 

tial equations. Dr. Hildebrand has attempted 
to evince the underlying hypotheses in 
deriving the necessary formulas, and also 
to take into account relevant problems of 
error analysis, convergence, and stability. 


In addition to effecting many changes in 
the treatments of the first edition, the revi- 
sion introduces a selection of the more 
significant recent developments in the field 
and, correspondingly, broadens the focus 
of the text upon concepts and procedures 
associated with computers. New material 
includes sections on machine errors and on 
recursive computation, the consideration 
of Romberg and Filon integration and 
increased emphasis on the midpoint rule, 
and treatments of spline approximation 
and of minimax approximation by poly- 
nomials and rational functions. Other new 
topics include pivoting and equilibration, ill 
conditioning, Crout’s method for tri- 
diagonal systems, the iterative methods of 
Muller, Traub, Ostrowski, and others, modi- 
fications of the method of false position 
and of the Lin and Bairstow procedures, 
expanded treatments of the concept of 
“order” of an iterative process, and addi- 
tional methods for solving sets of non- 
linear equations. Reference lists have been 
updated, and there are more than 150 

new problems. 
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PREFACE 


The preface to the first edition contained the following passages: 


This volume is intended to provide an introductory treatment of the funda- 
mental processes of numerical analysis which is compatible with the expansion of the 
field brought about by the development of the modern high-speed calculating devices, 
but which also takes into account the fact that very substantial amounts of computation 
will continue to be effected by desk calculators (and by hand or slide rule), and that 
familiarity with computation on a desk calculator is a desirable preliminary to large- 
scale computation in any case. 

5 τε a 

The present text is based on the premise that the introductory course should 
provide a fairly substantial grounding in the basic operations of computation, 
approximation, interpolation, numerical differentiation and integration, and the 
numerical solution of equations, as well as in applications to such processes as the 
smoothing of data, the numerical summation of series, and the numerical solution of 
ordinary differential equations. It is believed that this course not only should exhibit 
techniques available for each purpose, but also should attempt to derive the relevant 
formulas in such a way that the underlying hypotheses are in evidence and that methods 
of generalization and modification are reasonably apparent, and that the problems of 
error analysis, convergence, and stability should be treated as adequately as time and 
preparation permit. Furthermore, the course desirably should be accompanied by a 
problem laboratory, in which enough actual computation is effected (presumably by 
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use of desk calculators) to establish the practical significance of the theoretical develop- 
ments. 

Such an introduction should afford preparation for an “advanced course,” 
dealing with certain of the somewhat more sophisticated aspects of the solution of 
equations and with modern methods of matrix inversion and determination of 
characteristic values of matrices, together with the numerical solution of partial 
differential equations and of integral equations, .. . 

These remarks apply nearly as well to the present edition and to the 
philosophy underlying it. In particular, it is believed to be unrealistic to suppose 
that henceforth essentially all computation will be delegated to large-scale pro- 
eramed digital computers; that soon sufficiently efficient algorithms will exist to 
permit one to entrust most problems directly to a computer without preliminary 
or subsequent individual analyses and without a fair understanding of the 
rationale of the algorithms to be employed; and that, accordingly, the “numerical 
analysis” which is to be relevant in the classroom properly divides into the con- 
sideration of flow charts and of the technique and logic of computer pro- 
graming, on the one hand, and the study of relatively profound mathematical 
treatments of special areas, on the other. Excellent texts for these two purposes 
are available and are under preparation; but, in addition, it is thought that there 
exists a continuing need for suitable texts which follow a more classical middle 
course. 

This edition primarily was intended to introduce a selection of the more 
recent significant developments in the field and, correspondingly, to increase the 
focus of the text upon concepts and procedures associated with computers. At 
the same time, many changes have been made in phraseology, notation, and 
arrangement of the earlier material and a number of the treatments have been 
modified and amplified (or abbreviated). 

The new material introduced includes sections on machine errors and on 
recursive calculation in Chapter 1; increased emphasis on the midpoint rule 
and the consideration of Romberg integration and the classical Filon integration 
in Chapter 3; a modified treatment of prediction-correction methods and the 
addition of Hamming’s method in Chapter 6; the use of recursive methods in 
Chapters 7 and 8; brief considerations of uniform (minimax) approximation by 
polynomials and rational functions and four sections on spline approximation 
in Chapter 9; and treatments of pivoting and equilibration, ill conditioning, 
Crout’s method for tridiagonal systems, the iterative methods of Muller, Traub, 
Ostrowski, and others, modifications of the method of false position and of the 
Lin and Bairstow procedures, and expanded treatments of the concept of 
the “order” of an iterative process, as well as additional methods for solving 
sets of nonlinear equations, in Chapter 10. Some results bearing on error 
analysis and on the convergence of approximation sequences, but depending 
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upon some familiarity with analytic functions of a complex variable, were 
inserted as text material or as annotated problems in Sections 3.9 and 4.11. 
Reference lists were expanded and updated, and more than one hundred and 
fifty new problems were added. 

As in the first edition, the chapter treating the numerical solution of 
equations is essentially independent of the other chapters and is placed at the 
end of the text (Chapter 10) so that relevant portions of its content can be 
inserted when they are needed in other developments at the discretion of the 
instructor. Thus, for example, some information relative to the practical 
solution of sets of linear algebraic equations should precede the consideration 
of least-squares methods in Chapter 7. Alternatively, it may be desirable to 
introduce part or all of Chapter 10 immediately following Chapter 1. 

The Bibliography (Appendix B) includes only books and papers to which 
explicit reference was made either in the text material or in the Supplementary 
References sections at the ends of chapters. Obviously, it is not exhaustive in 
any category, and many outstanding numerical analysis texts and research 
contributions are omitted. In order to facilitate the use of the text for reference 
purposes, a Directory of Methods is included as Appendix C. 

The author has profited from criticisms and encouragements by a rather 
long list of colleagues and students and is particularly indebted to Professor 
Philip Rabinowitz for many helpful suggestions relative to the present edition. 


F. B. HILDEBRAND 
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INTRODUCTION 


1.1 Numerical Analysis 


The ultimate aim of the field of numerical analysis is to provide convenient 
methods for obtaining useful solutions to mathematical problems and for 
extracting useful information from available solutions which are not expressed 
in tractable forms. Such problems may each be formulated, for example, in 
terms of an algebraic or transcendental equation, an ordinary or partial dif- 
ferential equation, or an integral equation, or in terms of a set of such equations. 

This formulation may correspond exactly to the situation which it is 
intended to describe; more often, it will not. Analytical solutions, when 
available, may be precise in themselves, but may be of unacceptable form 
because of the fact that they are not amenable to direct interpretation in 
numerical terms, in which case the numerical analyst may attempt to devise a 
method for effecting that interpretation in a satisfactory way, or he may prefer 
to base his analysis instead upon the original formulation. 

More frequently, there is no known method of obtaining the solution 
in a precise form, convenient or otherwise. In such a case, it is necessary either 
to attempt to approximate the problem satisfactorily by one which is amenable 
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to precise analysis, to obtain an approximate solution to the original problem 
by methods of numerical analysis, or to combine the two approaches. 

On the other hand, the problem itself may not be clearly defined, and 
the analyst may be provided only with its partial solution, perhaps in the form 
of a table of approximate data, together with a certain amount of information 
with regard to its reliability, or perhaps in terms of an integral defining a 
function which cannot be expressed in terms of a finite number of tabulated 
functions. His purpose then is to obtain additional useful information concern- 
ing the function so described. 

Generally the numerical analyst does not strive for exactness. Instead, 
he attempts to devise a method which will yield an approximation differing 
from exactness by less than a specified tolerance, or by an amount which has 
less than a specified probability of exceeding that tolerance. When the informa- 
tion supplied to him is inexact, he attempts both to obtain a dependable measure 
of the uncertainty which results from that inexactness and also to obtain an 
approximation which possesses a specified reliability compatible with that 
uncertainty. 

He tries to devise a procedure which would be capable of affording an 
arbitrarily high degree of accuracy, in a wide class of situations, if the reliability 
of given information and of available calculating devices were correspondingly 
high. Even when successful in this attempt, he still seeks alternative procedures 
which may possess certain advantages in convenience or efficiency in certain 
situations, but which may be of less general applicability, or which may have 
the property that the degree of accuracy obtainable, even under ideal circum- 
stances, cannot exceed a certain limit which depends upon the function to be 
analyzed. In this last case, which 15 of frequent occurrence, he attempts to 
ascertain that limit and to classify the situations in which it is not sufficiently 
high. 

Needless to say, there are relatively few situations in which all these 
objectives have been, or can be, perfectly attained, as will be illustrated in the 
sequel. However, research with these aims in view continues to provide new 
procedures, as well as additional information with regard to the basic advantages 
and disadvantages of the older ones. Additional impetus has been afforded by 
the development of automatic desk calculators and, more recently, of large-scale 
computers. For example, certain methods had long been known to possess 
important theoretical advantages, but had not been convenient, from the point 
of view of the labor and time involved, for use in hand calculation or in calcula- 
tion based on the use of the slide rule or of tables of logarithms, and hence 
had been considered as little more than mathematical curiosities. However, 
technological developments have promoted several of them into a much more 
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active status and have also created additional need for detailed reexamination 
and modification of other existing methods and for a search for new ones. 

One of the most rapidly expanding phases of numerical analysis is that 
which deals with the approximate solution of partial differential equations. 
But a basic understanding of the more involved problems which arise in that 
phase of the analysis depends strongly upon familiarity with similar problems 
which arise, in a somewhat simpler way, in connection with the solution of 
algebraic and transcendental equations, the processes of interpolation and 
approximation, numerical differentiation and integration, and the approximate 
solution of ordinary differential equations, in which only one independent 
variable is involved. These are the topics which are to be treated, for the most 
part, in what follows. 


1.2 Approximation 


In many of the problems which arise in numerical analysis, we are given certain 
information about a certain function, say f(x), and we are required to obtain 
additional or improved information, in a form which is appropriate for inter- 
pretation in terms of numbers. Usually f(x) is known or required to be con- 
tinuous over the range of interest. 

A technique which is frequently used in such cases can be described, in 
general terms, as follows. A convenient set of n + 1 coordinate Junctions, 
say (x), $;(x),..., ¢,(x), is first selected. Then a procedure is invented 
which has the property that it would yield the desired additional information 
simply and exactly (barring inaccuracies in calculation) if f(x) were a member 
of the set S, of all functions which are expressible exactly as linear combinations 
of the coordinate functions. Next, use is made of an appropriate selective 
process which tends to choose from among all functions in S, that one, say 
Y»(x), Whose properties are as nearly as possible identified with certain of the 
known properties of f(x). In particular, it is desirable that the process be one 
which would select f(x) if f(x) were in S,. The required property of f(x) is then 
approximated by the corresponding property of yAx). Finally, a method is 
devised for using additional known properties of J (x), which were not employed 
in the selective process, for estimating the error in this approximation. 

Clearly, it is desirable, first of all, to choose coordinate functions which 
are convenient for calculational purposes. The - 1 functions |e ae eee a 
which generate the algebraic polynomials of degree x or less, are particularly 
appropriate, since polynomials are readily evaluated and since their integrals, 
derivatives, and products are also polynomials. 

Of much greater importance, however, is the natural requirement that 
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it be possible, by taking sufficiently large, to be certain that the set S, of 
generated functions will contain at least one member which approximates the 
function f(x) within any preassigned tolerance, on the interval of interest. It 
is a most fortunate fact that the convenient set S,, which consists of all poly- 
nomials of degree n or less, possesses this property if only f(x) is continuous on 
that interval and the interval is of finite extent. 

This fact was established in 1885 by a famous theorem of Weierstrass, 
which states, in fact, that any function f(x) which is continuous on a closed 
interval [a, b] can be uniformly approximated within any prescribed tolerance, 
on that interval, by some polynomial. By this statement we mean that, given 
any positive tolerance δ, there is a polynomial p(x) such that | f(x) — p(x)| S « 
for all x such thata < x SB. 

Principally for the two reasons just given, polynomial approximation is 
of wide general use when the function to be approximated is continuous and the 
interval of approximation is finite, as well as in certain other cases, and accord- 
ingly is to form the basis of much of the work which follows. Other types of 
approximations are considered in Chap. 9. 

Following the choice of the set S,, an appropriate selective process must 
be chosen in accordance with the nature of the available information concerning 
the function f(x). When the value of f(x) is known for at least n + 1 values of 
X, SAY Xp, X4,--+5Xq, the simplest and most often used process consists of 
selecting, from the members of S,, a function y,(x) which takes on the same 
value as does f(x) for each of those + 1 values of x. Here again the choice of 
polynomials is convenient. For, whereas in the general case there may be no 
such function in S.,, or there may be several, it is a well-known fact that there 
exists one and only one polynomial of degree n or less which takes on prescribed 
values at each of n + 1 points. In particular, if f(x) is indeed in S,, this process 
will then select it. Other useful processes are described in Chaps. 7 and 9. 

The final problem, that of devising an appropriate method of estimating 
the error, is a troublesome one and cannot be discussed at this point. Clearly, 
the precision of the estimate must depend upon the amount of available in- 
formation relative to f(x), and its usefulness will depend upon the form in 
which that information is supplied. In particular, if all available information 
is needed by the selective process, no error estimate is possible. 

It is of some importance to notice that, if S, is indeed taken as the set of 
all polynomials of degree 7 or less, then the Weierstrass theorem guarantees 
only the existence of a member of δ᾽ which affords a satisfactory approximation 
to a continuous function f(x), on any finite interval, when n 15 sufficiently large. 
This does not imply that the particular member chosen by a particular selective 
process will tend to afford such an approximation as m increases indefinitely. 
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Only when a dependable method of estimating the error is available can this 
question be resolved with certainty. Furthermore, even though it were possible 
to devise a selective process which had this property, it would not necessarily 
follow (for example) that the derivative of the selected polynomial yAX) would 
tend to approximate the derivative of f(x), even though the latter were known to 
exist and to be continuous. Again, recourse must be had to an error analysis. 

When the selective process specifies n + 1 instances of exact agreement 
between the function f(x) and its approximation and/or between certain of their 
derivatives, on a discrete set of points, the resultant approximation (if it exists) 
is called an interpolation. In particular, when functional agreement is prescribed 
at n + 1 distinct points, the interpolation process is sometimes said to be 
collocative. In some references the term interpolation is used in a somewhat 
more general sense. (See, for example, P. J. Davis [1963]. 


1.3 Errors 


Most numerical calculations are inexact either because of inaccuracies in given 
data, upon which the calculations are based, or because of inaccuracies intro- 
duced in the subsequent analysis of those data. In addition to gross errors, 
occasioned by unpredictable mistakes (human or mechanical) and hypothetically 
assumed to be absent in the remainder of this discussion, it is convenient to 
define first a roundoff error as the consequence of using a number specified by n 
correct digits to approximate a number which requires more than n digits 
(generally infinitely many digits) for its exact specification.{ Such errors are 
frequently present in given data, in which case they may be called inherent 
errors, due either to the fact that those data are empirical, and are known 
only to n digits, or to the fact that, whereas they are known exactly, they are 
“rounded” to n digits according to the dictates of convenience or of the capacity 
of a calculating device. They are introduced in subsequent analysis either 
because of deliberate rounding or because of the fact that a calculating device 
is capable of supplying only a certain number of digits in effecting operations 
such as addition, multiplication, division, conversion between number systems, 
and so forth. 

It is then convenient to define a truncation error, by exclusion, as any 
error which is neither a gross error nor a roundoff error. Thus, a truncation 
error is one which would be present even in the hypothetical situation in which 


ΤΑ name followed by a date in brackets refers to an item in Appendix B, the 
Bibliography. 

ft It is assumed here and elsewhere, for simplicity and consistency, that the decimal 
notation is used. 
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no “mistakes” were made, all given data were exact, and infinitely many digits 
were retained in all calculations. Frequently, a truncation error corresponds to 
the fact that, whereas an exact result would be afforded (in the limit) by an 
infinite sequence of steps, the process is truncated after a certain finite number of 
steps. However, it is rather conventional to apply the term in the more general 
sense defined here. 

We define the error associated with an approximate value as the result 
of subtracting that approximation from the true value, 


True value = approximation + error (1.3.1) 


with a remark that both this definition and that in which the algebraic sign of 
the error is reversed are used elsewhere in the literature. 

The preceding definitions can be illustrated, for example, by calculations 
based on the use of power series. Thus, if a function f(x) possesses n + 1 
continuous derivatives everywhere on the interval [@, x], it can be represented 
by a finite Taylor series of the form 


$0) = f@ +O @ - + OP - ar + 


Fa) «, 1 PO gy 
Se cs a) be Tek a) (1.3.2) 


where ἔ is some number between a and x. If f(x) satisfies more stringent con- 
ditions, it can be represented by an infinite Taylor series 


40x) = fla) «ἴθ @ - a) + OP - ar 


B LO —ppeasc οὐ ἢ 


when |x — αἱ is sufficiently small. 

If f(x) is approximated by the sum of the first n + 1 terms of (1.3.3), 
then the error committed is represented by the last term (remainder ) in (1.3.2). 
Thus, for example, if f(x) = e-* anda = 0, we have the relation 


e*a1—xt+4x? — be? 4+ Ε: 0.) (1.3.4) 
where the truncation error is of the form 
Ey, = dye"*x* (Ἃξ between 0 and x) (1.3.5) 


If x is positive, the same is true of ἔξ, and, by making use of the fact that ε΄ ὃ 
is then smaller than unity, we may deduce that the approximation 


e*xl—x+4x?-—1° (1.3.6) 
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is in error by a positive amount smaller than z4x*. In particular, we have 


- 1,3 αν = 
CARAT ay ats ἡ σπὴ 


with an error between zero and τοῖς. Since τοῖς = 0.00051, where the symbol 
+ is used to signify “rounds to,”’ the truncation error is smaller than 5.2 x 10. 5. 
If +63 1s rounded to four places, to give e~'/? =~ 0.7160, the additional error 
introduced by the roundoff is less than (but here very nearly equal to) five 
units in the first neglected place and hence smaller than 0.5 x 107+. It follows 
finally that e~*/* ~ 0.7160 with an error of magnitude smaller than 5.7 x 1074. 
However, whether e~ 1/3 = 0.716 or 0.717 is not established. If each of the terms 
in (1.3.7) were rounded to four places before the terms were combined, a total 
roundoff error as great as 1.5 x 10°* would be possible. Finally, if the 
exponent 4 represented only an approximation to a value of x, which was not 
known exactly but which was known to lie, say, between 0.333 and 0.334, the 
approximate maximum error due to this uncertainty could be determined by 
noticing that the change de~* corresponding to a small change 6x is approx- 
imately (de~*/dx) 6x = —e™* dx. Thus, if the number 1 is in error by an 
amount between —3 x 10~* and +7 x 107+, the magnitude of the maximum 
corresponding error in the calculated value is about (0.716)(7 x 10°*) = 
5x 10", 

The magnitude of the roundoff errors could be reduced arbitrarily by 
retaining additional digits, and that of the truncation error could be reduced 
within any prescribed tolerance by retaining sufficiently many terms of the 
convergent Maclaurin expansion of e~*. The effect of an inherent error could 
be reduced only if the uncertainty of the value of x were decreased. 

It is useful to notice that, since the sign of the truncation error associated 
with (1.3.7) is known, the magnitude of the maximum possible error due to 
truncation a be halved by replacing the approximation 126 by the approxima- 
tion 116 + 2 704g = 2185 = 0.7163, with a corresponding truncation error 
accordingly known to lie between the limits +2.6 x 107+. 

As an example of a somewhat different nature, we refer to the relation 


[54 ι 1 2 3. ye τ Ὁ! 


-Ξ-ῬἭἽ - - ---  -- -- YH 
. ἃ ee ae x 


+ (—1)"n! | ont dt (x > 0) (1.3.8) 
which is readily established by successive integration by parts. If we denote the 
left-hand member by F(x), we can thus write 


! ee Ἢ 
»1-. Ὁ 2! ee γε - 9! (1.3.9) 
x x i 
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when x > 0, with a truncation error 


fe 8) —t 
ἘΞ: (tnt | di (1.3.10) 


Since x — 1 is nonpositive in the range of integration, so that <1, 
we may deduce that 


° dt 
pet 1 


E(x, n)| <a! | 


x 


or 


— 1)! 
E(x, ny) SS 3.11) 
x 
Hence the truncation error is smaller in absolute value than the last term 
retained in the approximation and also is evidently of opposite sign. 
Further, since 1/t"*! < 1/x"*! in the integration range, we see that 


Ι 2 ! 
|E7(x, n)| 5 sn | etdt=— 5 13.12) 


so that the truncation error here is also smaller in absolute value than the 
first term neglected, and is of the same sign. 

For a fixed number (n) of terms, the truncation error clearly is small 
when x is large and can be made arbitrarily small by taking x sufficiently large. 
However, for a given x, the error cannot be made arbitrarily small by retaining 
sufficiently many terms. In fact, we may notice that if the right-hand member 
of (1.3.9) were considered as the result of retaining the first m terms of an infinite 
series, then the ratio of the (7 + 1)th term of that series to the nth would be 
—n/x. Hence, the successive terms decrease steadily in magnitude as long as 
n < x, but then increase unboundedly in magnitude as n increases beyond x. 
Thus the series does not converge for any value of x. 

Nevertheless, it is useful for computation when x is fairly large. Thus, 
if x = 10, the smallest term occurs when ἢ = x = 10 and 15 given by 

—91/10'° = —3.6 x 10-5. Thus, the approximation afforded by retention 
of 10 terms would be in error by a positive quantity smaller than 4 x 107 2 
This would be the best possible guaranteed accuracy obtainable from (1. 3.9), 
when x = 10, since retention of additional terms would increase the possible 
magnitude of the error. 

A divergent series of the type just considered, for which the magnitude of 
the error associated with retention of only m terms can be made arbitrarily 
small by taking a parameter x sufficiently near a certain fixed value x9 (or 
sufficiently large in magnitude), and for which the error first decreases as n 
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increases but eventually increases unboundedly in magnitude with increasing 
n, when x is given a fixed value other than Xo, is often called an asymptotic 
series. An example of the former type, with xy = 0, is afforded by the relation 


ee “du 2 3 n n 
= 1—1!x + 2!x* — 31x? +--+ 4+ (-1)'n! x” + E(x, n) 
o 1+ xu 
when x 2 0, which can be obtained from (1.3.8) by replacing x by 1/x and 
making the change of variables 1 = (1 + xu)/x in the integral, and for which 
it is true that x” "E(x, n) -- 0 as x tends to zero from the positive direction, 
but [E(x, n)| - οὐ asm — oo for any fixed x > 0. 
For a representation of the form 
f(x) = apt 2+ Bee 4+ 2+ EG, η) 
x X x 

it is usually stipulated also (following Poincaré) that x"E(x, n) is to tend to 
zero as |x| — 00; for an expansion of the form 


F(X) = do + a(x — Xo) + a,(x — Xo)? + °° + a(x — Xo)" + E(x, n) 


the additional requirement that E(x, n)/(x — xo)" is to tend to zero as x > Xo 
is usually imposed. Equation (1.3.12) shows that (1.3.9) is thus asymptotic 
in the strict sense. However, the term is often applied somewhat more loosely 
to expansions of more general type, which are not necessarily power series. 

When x is fixed, the error frequently decreases rapidly, as additional 
terms are taken into account, until a point of diminishing return is reached, 
after which the error begins to increase in magnitude. In such cases, if the error 
is reduced within the prescribed tolerance before that point is attained, then the 
approximate calculation can be successfully effected. 

A great many of the expansions which are of frequent use in numerical 
analysis are essentially of this type. For them, the term truncation error generally 
applies only in the general sense of the definition given earlier in this section, 
and generally does not correspond to the result of truncating a convergent infinite 
process after a finite number of steps, but to the result of truncating a process 
which first tends to converge, but would ultimately diverge, at a stage before 
the tendency to diverge manifests itself. 

The danger inherent in an unjustified assumption that a particular rep- 
resentation is of this type can be illustrated in terms of the function 


1 — 9.9x — 49.99x? — 4.999χϑ 


xX = 
70 1 -- 10x -- 49x? 


10 INTRODUCTION TO NUMERICAL ANALYSIS 


If, for small values of x, f(x) is to be approximated by a finite Taylor series in 
powers of x, the leading terms can be obtained in the form 


f(x) © 1 + O.1x + 0.01x? + 0.001x? + τ: 


by long division or otherwise. Thus, for example, when x = 0.1 the first four 
approximations to f(0.1), obtained as partial sums, are 1, 1.01, 1.0101, and 
1.010101. Whereas most rules of thumb would suggest that the error in the 
fourth approximation is positive and less than 10~°, calculation shows that 


f(0.1) = 1.009998 


and hence that the error in fact is negative, with magnitude exceeding 100 units 
in the sixth place. 


1.4 Significant Figures 


The conventional process of rounding or “forcing” a number to ἡ digits (or 
“figures”) consists of replacing that number by an n-digit approximation with 
minimum error.t When this requirement leads to two permissible roundings, 
that one for which the nth digit of the rounded number is even is generally 
selected. With this rule, the associated error is never larger in magnitude than 
one-half unit in the place of the mth digit of the rounded number.f Thus 
4.05149 = 4.0515, 4.051, 4.05, 4.1, and 4. It may be noted here that whereas 
4.05149 rounds to 4.0515, which in turn rounds to 4.052; nevertheless, 4.05149 
rounds directly to 4.051. Thus rounding is not necessarily transitive. 

The errors introduced in the rounding of a large set of numbers, which are 
to be combined in a certain way, usually (but not always) tend to be equally 
often positive and negative, so that their effects often tend to cancel. The slight 
favoring of even numbers is prompted by the fact that any subsequent operations 
on the rounded numbers are then somewhat less likely to necessitate additional 
roundoffs. 

Each digit of a number, except a zero which serves only to fix the position 
of the decimal point, is called a significant digit or figure of that number. Thus, 
the numbers 2.159, 0.04072, and 10.00 each contain four significant figures. 


+ This type of abridgment is to be distinguished from the process of chopping, which 
consists of merely discarding all digits following the nth digit without modifying the 
nth digit and which must be used when capacity limitations of a calculating device 
do not permit the determination of more than n digits. 

t It may be noticed that, when 9.95 is rounded to 10.0, the result still contains three 
correct digits; the error amounts to one-half unit in the place of the third figure of 
the rounded number but to five units in the place of the third figure of the original 
number. 
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Whether or not the last digit of 14620 is significant depends upon the context. 
If “ἃ number known to be between 14615 and 14625” is intended, then that 
zero is not significant and the number would preferably be written in the form 
1.462 x 10*. Otherwise the form 1.4620 x 10* would be appropriate. 

More generally, if any approximation N to a number N has the property 
that both N and N round to the same set of n significant figures, and if n is the 
largest integer for which this statement is true, then N may be said to approximate 
N ton significant figures. Thus, if N = 34.655000--- and N = 34.665000---, 
then n = 4. Clearly, the error N — N cannot exceed one unit of the place of 
the nth digit, but, as this example illustrates, the error may take on that maximum 
value. In the case when N = 38.501--- and N = 38.499, while it is true that 
N and N both round to the same four significant digits (so that here n = 4), 
it may be noted that they do xot round to the same two significant digits in 
spite of the fact that the error is less than three units in the place of the fifth 
digit. This point is of practical importance only in that it illustrates the fact 
that, no matter how accurately a calculation is to be effected, the result of 
rounding the calculated value to n digits cannot be guaranteed in advance to 
possess n correct digits but may differ from the rounded true value by one unit 
in the last digit. 

It may be seen that the concept of significant figures is related more 
intimately to the relative error 


true value — approximation 


Relative error = (1.4.1) 


true value 


than to the error (or the absolute error) itself. In order to exhibit the relationship 
more specifically, it is useful to define N* and r such that 


N = N* x 10° where 1 < Ν < 10 (1.4.2) 


where r is an integer and hence is that integer for which 10° < N < 1011, 
when WN is positive. Thus N* = N when 1 < N < 10, = N/10 when 10 < 
N < 100, = 10N when 0.1 < N < 1, and so forth. If we write E = N—N 
and R = E/N, for the error and relative error, respectively, and suppose that 
N approximates N to n significant figures, so that 


|E| < 10 πὶ! (1.4.3) 
it then follows that 


[E| o 10°—"+1 _ 107"*! 


R= Seay WF 
N N* x 10° N 


(1.4.4) 


In particular, we have |R| < 10.511 
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Further, if N is the result of rounding N to n significant figures, (1.4.3) 
is then replaced by the stronger estimate 


ΙΕ| < 5 x 10°" (1.4.5) 
and there then follows 


IRI <—-x 10-" (46) 
N* 


In particular, |R| < 5 x 10°". If we also write 
E=o x 10° "*! (1.4.7) 


it follows that ὦ is the error expressed in units of the place of the nth digit of N 
and we have also | 


E NR x n-1 
Suppose next that two numbers N, and N, are each rounded to n sig- 
nificant figures, and that the corresponding maximum error in the product 
P = N,N, of the rounded numbers is required. We notice first that, if R(P) 
refers to P and R,, R, to N,, N2, there follows 


N,N. N,N, 


R(P) = 
(P) NN, 


= 1 -- (1 — R,)i — R,) = Καὶ + R, — R,R, 


Thus we see that |R(P)| is largest when R, and R, are negative, and, from 
(1.4.6), there follows 


Ξε 1 1 - 25 ἘΠ 
Ε(ΡῚ - 5{— + —) x 105 + —— x 10°” 
IR) S is 7") NiN> 


Hence, by using (1.4.8), we obtain 


py < (NiN2)* (1 1 5(Ν,Ν 2)" τῇ 

Since (Ν,Ν,)}} = NTNZ x 107°, where ρ is either 0 or 1, the right-hand 
member of (1.4.9) is of the form 

107° oars x =n 

eh + NX +5 x 10~") 

and the most unfavorable cases are those for which p = 0. Under this con- 

straint, the function ¢(N*, N3) = 4(Ni + ΝΣ +5 x 1075) is to be con- 

sidered only for 1 < N* < 10, 1 S NZ < 10, and 1 Ξ ΝἾΝΣ < 10, and 

clearly cannot take on a maximum value in the interior of this region. The 

maximum value of φ on the boundary of the region is easily seen to occur when 
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either Nf = 1 and N3 = 10 or Ni = 10 and N3 = 1. Thus the right-hand 
member of (1.4.9) cannot exceed the limiting value corresponding to (N,N,)* = 
10— and either Nf = 1, NZ = 10— or N¥ = 10—, N* = 1, and there 
follows 

la,(NiN2)| - ' + 3x 10" <6 (1.410) 


where ὦ, is the error expressed in units of the place of the nth digit of the true 
value. 

This means that if two numbers are rounded to n significant figures, the 
product of the rounded numbers differs from the true product by less than six 
units in the place of its nth significant digit. In illustration, when N, = 1.05+ 
and N, = 9.45+ there follows N,N, = 9.9225+, whereas, if N, and N, 
are rounded to two significant figures to give N, = 1.1 and N, = 9.5, there 
follows N,N. = 10.45. Thus, in this rather extreme case, a, = —(5.275—). 

When N, = N, = N, the worst (limiting) situation is that in which 
(N*)? = (N’)* = 10—. Thus there follows 


Ιω, (Ν᾿ 2 < 1077+ $x 10" <4 (1.411) 


so that the square of a number rounded to n significant digits differs from the 
square of the unrounded number by less than four units in the place of its nth 
significant digit. 

More generally, if we consider P = N,N, °--N,,, we find that 


R(P) =1- Ld — R,)d - R,)°°- Ξ Rn) | 
and hence 


Ae)! [ + [RiP + [Rol ss + (Ral) — 1] 


n 


es e@ 8 * 
«(Νι Nn)" fe )Πι...2...(1.. Oe) wg 
20, Ν᾿ ΝΣ N* 


(1.4.12) 


|o,(P)| 5 


where 
a, =5x 10 (1.4.13) 


Here the worst situation is that in which m — 1 of the m numbers N t are 1 
and the remaining one is 10—, such that also 


ΝΕ ΝΣ -- (Ν, ἜΡΟΝ YF = 10— 
Thus there follows, from (1.4.12), 


i 


Ιω,(Ν, -..Ν,,}} < = t + a,)"~} (: + i) -- 1 (1.4.14) 
α 10 
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Corresponding numerical bounds on the quantity |o,(N,°°: N,,,)| are given in 
Table 1.1. 


Table 1.1 
2 3 4 6 8 10 
6 11 17 29 42 56 
6 11 16 26 36 46 
In the special case when N, = N, ξἨ, "᾿Ξ N,, = N, there follows 
ΝΕ (N™)* Oo m 
o,(N™)| Ξ ——j|(1+—4]} -1 1.4.15 
|o,(N™)| ua, ΝΞ ( ) 


and the worst situation is that in which (N*)” = (N™)* = 10—, when m is a 
positive integer, so that 


Ιω (Ν 5} < >[(1 + 107," — 1] (m= 2,3,...) (1.4.16) 
α 


n 


Numerical bounds on |@,(N™)| are given in Table 1.2. 


Table 1.2 
2 3 4 6 8 10 
4 8 12 23 35 48 
4 7 12 21 31 41 


When P = N™, with m = 1/p the reciprocal of a positive integer, so 
that the operation involved is that of root extraction, the relation (1.4.15) 
again holds; but here the worst case is that in which N = 10¢—m/m 4 and 
N* = 1+, where k is any integer, so that (V”)* = 10'-™+, in accordance 
with which there follows 


10,(N")| < 3 x 10-1 +0,)"—-1] (m=4,4...) (417 


Numerical bounds on |w,(N*/”)|, where p is a positive integer, are given in 
Table 1.3. 


Table 1.3 


2 3 4 8 16 32 


0.79 0.78 0.71 0.47 0.28 0.15 


INTRODUCTION 15 


Since, when NN + 0, there follows 


1 1 IN cee ἐς 

( Ε 5) VR) = - ΑΙ ΠῚ (1.4.18) 
it is seen that the bounds of Eq. (1.4.14) and Table 1.1 also apply (very nearly) 
to division if m is interpreted as the total number of factors in a ratio. Here, 


however, special notice should be taken of the formula 


relating the true errors, which is particularly significant when [Ν] « 1. 
Each of the given bounds applies for all N (with obvious exceptions) but 
may be quite conservative in any specific case. Thus, if it is known only that 


ΙΝ — 9.61] Ξ 0.005, then it can be verified that |/N — 3.100] < 0.0009, 
whereas Table 1.3 gives a bound of 0.0078. (Here the guaranteed accuracy of 
the calculated result is greater than that of the basic data.) Still, none of the 
bounds can be appreciably lowered since each is nearly attained in some case. 

In illustration, we may note that, if N = 1.445 and if N° is approximated 
by (1.44)° = 8.916, the result differs from the true value N® = 9.103 by about 
19 in units of the third digit. Table 1.2 gives an upper limit of 21. The number 
(106.4)"/* should be reliable to three significant figures, according to Table 1.3, 
with the fourth digit in error by no more than 1. The calculated value rounds to 
4.7385, whereas (106.35)'/° = 4.7378 and (106.45)!/> = 4.7393. The maximum 
error is thus about 0.8 units of the place of the fourth digit, as is just admitted by 
Table 1.3, and the last digit of the rounded four-place value, 4,738, is in error 
by not more than 1. However, whereas the value actually calculated is in error 
by an amount not exceeding 0.8 x 1073, as predicted, the rounded value may 
be in error by 1.3 x 1073. 

The calculated value of the product 


(3.658)(24.765)(1.4345)(72.43) 


certainly will be in error by less than 16 units of the place of its fourth significant 
digit, by virtue of Table 1.1, under the assumptions that each factor is correctly 
rounded to the digits written and that sufficiently many digits are retained in the 
calculation itself. However, since the second and third factors each involve 
five significant figures, their product alone will be correct within six units of its 
fifth digit, so that actually the maximum error is very nearly the same as that 
associated with the product of three four-digit numbers and hence will be less 
than 11 units in the place of the fourth digit. Clearly (contrary to advice 
sometimes given) the procedure of deliberately rounding each of the factors to 
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four digits before the multiplication would be a wasteful one, since it thus would 
increase the maximum possible error. Multiplication actually yields the cal- 
culated value 9.412 x 10° (to four digits), while the largest and smallest possible 
values of the true product are found to round to 9.415 x 10° and 9.410 x 10°. 
Thus the maximum error here is only 3 in the fourth digit. The result of rounding 
each factor to four digits before multiplying rounds to 9.407 x 10°, which 
hence certainly is in error by at least 3 in the fourth digit and which may be in 
error by as much as 8. 


1.5 Determinacy of Functions. Error Control 


When any differentiable function f(N) is evaluated with N replaced by an 
approximation N, the relation 


f(N) —f(N) =(N—N)f') ~~ (q between NandN) (1.5.1) 


permits us to deduce that 


IEF) Ξ IF’ maxl EN DI (1.5.2) 


and 


AT Lf’) max AT 
R(fC(N))| < ————™ |E(N 1.5.3 
IR(fCN))| S ΤΙΝῚ ΙΕ(ΝῚ ( a) 
and also that 


Ν [ΝῚ [Γ΄ Imax pp 
ΚΟ (ΝῊ 3 ----.-Ξ--- [RCV 1.5.3b 
IRG(N))| Ξ ΠΑΝῚ IR) ( ) 


Analogous results are readily obtained in cases when several independent 
variables are involved. 
In illustration, if {(N) = logy N, there follows 


\E(log,5 ΝῊ < “Ἔρος ΙΕ(ΝῚ [ἡ between Ν -- E(N) and N + E(N)] 
n 


and hence, if N = 1 and |E(N)| < 3, 

0.44 
1 — |E(N)| 
so that the error in the common logarithm is smaller than the error in its argu- 


ment, when that argument exceeds unity. On the other hand, 


101 
lo£io0 6 


|E(ogio ΝῊ S JE(N)| < |E(N)| ἀἃ(.5.4) 


|E(10%)| = |E(N)| 
and hence, if |E(N)| < 3, 


IR(10%)| < 2.31 x 10} Ε(ΝῊῚ < 8|E(N)| 4(.5.5) 
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Thus, the error in 10%, expressed in units of the place of its nth significant 
figure, is less than 8|E(N)| x 10". Hence, if the error in the common logarithm 
is smaller in magnitude than | in units of the nth decimal place, then the anti- 
logarithm is in error by less than 8 in units of its nth significant figure and hence 
is correct to at least n — 1 significant figures. 

As a further illustration of the use of (1.5.1), we next investigate the degree 
of determinacy of the quantity 

log sin 1.412762 


under the assumption that the argument is a rounded number. The use of 
(1.5.1) gives the bound 


ΙΕ] Ξ (5 x 1077)|cot mlmax (1.4127615 < ἡ < 1.4127625) 


on the inherent error E, and, since 0.17 > cot x > 0.15 for 1.41 <x < 1.42, 
there follows |E| < 0.85 x 1077, so that the desired quantity is determinate 
to within less than one unit in the seventh decimal place. 

In the linear processes of addition and subtraction, the error in the result 
is merely the algebraic sum of the errors in the separate terms, and the magnitude 
of the maximum error is the sum of the maximum magnitudes of the component 
errors. Thus, whereas in multiplication and division we are concerned princi- 
pally with ratios of errors to true quantities, and with the number of significant 
figures, and the absolute position of the decimal point is of importance only in 
fixing the magnitude of the end result, in addition and subtraction the errors 
themselves usually are the important quantities (see Sec. 1.6 for an important 
exception), significant figures are involved only incidentally, and the orientation 
of a digit sequence relative to the decimal point is of importance throughout 
the calculation. 

Thus, if k numbers (positive or negative) are each rounded to n decimal 
places, so that each is in error by an amount less than 5 x 107"~! in magnitude, 
the magnitude of the maximum error in the sum is clearly 5k x 107"7}, 
corresponding to the situation in which the signs of the errors are such that 
they combine without cancellation. Accordingly, the result can be in error by 
as much as k/2 units in the nth decimal place. 

Formal addition assigns to the sum 


56.434 + 251.37 — 2.6056 + 84.674 — 396.06 + 7.0228 


the value 0.8352. However, if each number is correct only to the five significant 
figures given, the error in the result can have any value between the limits 


+ The notation log xu, with no base specified, is to be used consistently to denote 
log. u; the arguments of trigonometric functions are always to be expressed in 
radians unless degrees are explicitly specified. 
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+0.0111, so that the result should be recorded as 0.84, with the last digit in 
doubt by two units,} and only one correct significant digit can be guaranteed. 
Rounding all of the numbers to the two decimal places which are in common, 
before addition, would lead to the result 0.82, and would increase the error 
limits to +0.03. Rounding each of the more accurate numbers to one place 
beyond the last place of the least-accurate one gives 0.835 with error limits 
+0.012, so that the recorded entry is again 0.84 (or 0.8,), with the last digit in 
doubt by two units, and is a procedure which is generally to be recommended 
in such cases. 

A somewhat similar situation, in which the outcome is, however, reversed, 
is of some importance. Tables of functions often provide a column of differences 
between successive entries, to facilitate linear interpolation, according to the 
formula 


f (Xo + Oh) = f(%o) + OLf(%1) — fo] O< <1) (15.6) 


where x, and x, are successive tabular arguments and h = x, — Xo. In 
constructing such a table, in which (say) all entries are to be rounded to a 
certain number of decimal places, the question arises as to whether the number 
tabulated for the difference f(x,) — f(%o) should represent the rounded value 
of the difference or the difference between the rounded values, in those cases 
where these values differ. Intuition perhaps would recommend the former 
procedure, since it appears to make use of additional information. However, 
if e represents the maximum roundoff, the maximum error in that case is clearly 
(1 + θ)ε, whereas, since in the second case the right-hand member is properly 
to be considered in the form (1 — 0)f(%o) + Of(%,), and since 0 < 6 < 1, 
the maximum error in that case is seen to be (1 — O)e + θὲ = &. Thus the 
maximum error is less if the difference of the rounded values is used (for a more 
detailed discussion of this question, see Ostrowski [1952]). In particular, the 
user of tables which do not explicitly list differences need not regret the fact 
that he is forced to employ that procedure when using (1.5.6). The truncation 
error associated with that formula is considered in the following chapter. 

The loss of significant figures in subtraction is one of the principal sources 
of error in numerical analysis, and it is highly desirable to arrange the sequence 
of calculations so that such subtractions are avoided, if possible, or so that their 
effects are brought into specific evidence. As a simple example, in calculating 
ab — ac = a(b — c), where b and ¢ are very nearly equal, the products ab and 
ac may have many of their leading digits in common, and the number of sig- 


+ The notation 0.8, is often used to indicate that, whereas the 4 is not necessarily 
correct, the error associated with 0.84 is less than (or probably less than) that 
associated with 0.8. 
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nificant figures which must be retained in each product, in order that sufficiently 
many correct significant figures will remain in the difference, can be determined 
only after both products have been evaluated. This dilemma is avoided if b — c 
is calculated first. Naturally, if a, b, and c are specified only to a certain number 
of significant figures, and if no roundoffs are introduced, the order of the 
calculations is irrelevant from this point of view, and the corresponding degree 
of uncertainty in the result merely must be accepted. 

Frequently it is possible to exploit special properties of functions involved 
in the analysis. Thus, if a and ὁ are nearly equal, it is convenient to replace 
log ὁ — loga by log (b/a), sinb — sina by 2sin 4(b — a) cos t(a + δ), and 
Vb — Va by (6 — a)/(Va + V5). 

For example, if 44C « B?, the quadratic formula is inconvenient for 
the determination of the smaller root of the equation Ax? + Bx + C= 0. 


In this case, when B > 0, it is desirable to replace the familiar formula 
x, =(-Bt+ /B? — 4AC)/2A by the equivalent form 
—2C 
x, = ----- .---Ξ-:--:-"-::::::-- 
B + JB? — 4AC 


for a specific calculation, or to write 


—~B + /B? — 446 = -a(1 ς ji τ =) 


B2 


in the original form and to expand the result by the binomial theorem to give 


τε: 14 46,...\ 
B B? 


when the dependence of x, on the literal parameters is to be studied. 


1.6 Machine Errors 


In the preceding sections it has been implicitly assumed that the relevant sums, 
products, and quotients of exact numbers, or of their rounded approximations, 
are exactly calculable since the purpose has been to determine the extent to 
which the inherent determinacy of a function is reduced by roundoffs in its 
arguments. 

Thus, for example, if N, + N, + N; is replaced by Ν, ἘΝ, + N3, 
where the barred quantities are rounded approximations, the error in the sum 
is the sum of the errors provided that the term sum has its usual meaning. 
However, the machine sum of three numbers, supplied by a digital computer, 
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may or may not be identical with the true sum, and a similar statement applies 
to other operations. 

In order to complement the preceding error considerations, we now sup- 
pose that the numbers supplied to the computer are exact and we investigate the 
machine errors. 

If the computer operates in a fixed-point (that is, fixed-decimal-point) 
mode, then usually it deals with positive or negative numbers specified by, say, d 
decimal digits with the decimal point fixed to the left of the leading digit, so that 
each machine number has a magnitude less than 1. Although the computer 
usually can retain a 2d-digit product and also can effect certain other double- 
precision operations, the necessity of choosing appropriate new units (scaling) 
in the formulation of a problem, so that all numbers lie in the interval (—1, 1), 
is a source of considerable inconvenience. Once scaling has been accomplished, 
however, in such a way that all necessary sums, products, and quotients in a 
particular set of calculations are in (—1, 1), then, assuming here that the inputs 
are exact d-digit numbers, the only error introduced by the computer in any 
single operation is the final roundoff (if it is necessary) to d digits. Thus, here the 
machine sum is the same as the true sum, while the d-digit machine product or 
quotient will differ from the true product or quotient by not more than 5 x 
1054“: (or by an error not exceeding some other multiple of 10~¢ if rounding 
is replaced by another truncation process). 

In a floating-point mode, a number is expressed in the form +5 x 10? (or 
often in a closely related form) where ὁ (the mantissa) is expressed as an m-digit 
decimal such that 0.1 < ὃ < 1 and p (the exponent) is a signed integer, so that 
the need for scaling is avoided. However, this advantage is obtained at the cost 
of a somewhat more complicated error analysis. To illustrate, we first consider 
two examples in the simple case of four-digit floating-point arithmetic, assuming 
that there is a double-precision eight-digit accumulator. 

For the sum 


0.2946 x 10* + 0.3152 x 10? 
the addition yields 
0.2946 0000 x 107 


+ 0.0031 5200 x 10* 
0.2977 5200 x 10* 


and the true sum is rounded to the machine sum 0.2978 x 10*. The associated 
machine error is —0.48, and the relative error about —1.6 x 10-*. For the 
product 


(0.2946 x 10*)(0.3152 x 107) 
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the multiplication produces 0.0928 5792 x 10° and a shift, followed by a 
rounding, yields the machine product 0.9286 x 10°, the machine error being 
—0.208 x 102 and the relative error about —2.2 x 107+. 

In these two examples, if the accumulator had only a four-digit capacity 
(for the mantissa) and if “chopping” were then used so that all digits beyond the 
fourth would merely be dropped, the machine sum would be 0.2977 x 104 
and the machine product 0.9285 x 105, the relative machine errors becoming 
approximately —1.7 x 107* and -- 8.5 x 107+. 

In what follows, we will use the symbols (Ὁ, ©, ©, and © to denote 
machine addition, subtraction, multiplication, and division in the floating mode. 
Then it can be seen that 


N, Θ N, Ξ-- (Ν, + ΝᾺ) + 0) 


N, ON, = N,N2(1 + 8) (1.6.1) 


N 
NON = + 6) 
2 


where the value of θ may vary from one relation to another and may depend 
upon N, and N, but where in each case 


0) <5 x 10" (1.62) 


where m is the mantissa capacity in decimal digits, assuming that the accumulator 
capacity permits rounding to m digits. Otherwise, distinct increased bounds on 
each of the 6’s in (1.6.1) result, the increase being particularly significant for 
addition. 

Accordingly there follows, for example, 


(Ni; ® No) Φ N3 = {[(N, + N2)1 + 6,)] + N3}(1 + 6) 


and also 


Ν, Θ (N2 Φ N3) = N,(1 + 03) + (Nz + N3)(1 + 83)1 + 04) 


so that the associative law under addition no longer holds.+ We see that the 
maximum effect of machine error is minimized if the numbers are added in 
order of increasing magnitude, although the distinction becomes important only 
if many numbers are to be added. 


{1 For a computer with a double-precision accumulator, the partial sum N, + N2 
would be truncated (if necessary) to 2m digits in the first case so that [04] would not 
exceed a small multiple of 10~?”, while the final sum would be rounded to m digits 
so that [02] < 5 x 10-™. Similar statements apply to 03 and 64. 
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Further, there follows 
(Ν, © Nz) © N3 = [N,N2(1 + 9,)]N30. + 82) 
= N,N,N3(1 + 0:)0 + 42) 
and 
N, © (Nz © N3) = NiLN2N31 + 93)](L + 94) 
= N,N,N3(1 + 63)(1 + 4) 


Although these two machine products are generally unequal, their error bounds 
(that is, bounds on their deviations from the true value N,N,N;) are equal. No 
similar statement applies to the two preceding machine sums. 

However, it should be reemphasized that here, for simplicity, we have 
supposed that the m-digit numbers supplied to the computer are exact and have 
then considered the error introduced by machine operations. Suppose now that 
two numbers N, and N, are replaced by approximations N, = N, — δι and 
N, = N> — 62, where the errors e; and e, may be due to a rounding to m 
(or fewer) digits or to other sources, such as inaccuracies in observation or in 
experimental determination. 

The machine sum of N, and N, is then given by 


N,@N, = [Mi — δ) + (2 - e,)|(1 + 9) 
and hence 


(N, + Ny) — (Ny ® N,) = (6, + 62) — ON, + N2) + O(e, + 62) 
(1.6.3) 


This total error is made up of the term e, + 6, which is the roundoff error con- 
sidered in the preceding sections, the term —0(N, + No), which is the machine 
error currently under discussion, and the coupling term O(e, + 62). Whereas it 
could happen that the first two terms nearly cancel so that the coupling term 
is in fact dominant, insofar as predictable error bounds are concerned, if it is 
known that [6] and Je,| cannot exceed 8, the best bounds on the three terms are 
2c, (5 x 10°™)|N, + NI, and 10e x 10°", and the third bound is negligible 
relative to at least the first one. Hence, from this point of view, the previously 
considered errors and the machine errors can be linearly superimposed in this 
case. 
When the operation is multiplication, there follows 


N,N, — (N, ON2) = (ie. + Npe;) — ΘΝ.Ν, + Oe, + 62) 


Again the third term is negligible and the relative error is approximately the 
sum of the relative errors of the operands and the relative machine error. 
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In the sequel, mention seldom will be made of machine errors, as defined 
here. It may be noted that generally they are of importance only when the 
overall error tolerance is of the order of 10~” so that full machine capacity is 
essential. Otherwise, they become of significance for the determination of error 
bounds when a specific computation requires n basic machine operations where 
nis so large thatn x 107” is of the order of the error tolerance [or for statistical 
error estimates when the same is true of n'/? x 10~™ (see Sec. 1.7)]. In such 
cases, rigorous error analysis may be unpleasant and linear superposition of 
machine-error effects on other errors may not be appropriate. 

Finally, it should be noted that when values of a function F(N), such as 
N*”? or οἷ, are generated within the computer (perhaps by use of a preselected 
iterative process) instead of being supplied as inputs to the computer, machine 
errors again generally are introduced and, of course, then may be large relative 
to 10°". 


1.7 Random Errors 


If 1,000 positive numbers, each rounded to n decimal places, were added, the 
total error due to roundoff could amount to 500 units in the last place of the 
sum. Whereas this maximum error could be attained only in the case when all 
numbers were rounded in the same direction, by exactly one-half unit, the 
possibility of its occurrence forces us to accept its value as the least upper 
bound on the possible error. 

However, the price of certainty in such a case is a high one, and in most 
situations it cannot be tolerated. Furthermore, in a great number of practical 
cases certainty cannot be attained. Thus each member of a set of 1,000 numbers, 
to be added, may itself represent the mean of a set of empirical values of a 
physical quantity, in which case one generally cannot guarantee that the error 
associated with it is less than, say, 5 x 107"~1, but can only estimate the 
probability that this is the case. 

In most such cases it is assumed that the errors are symmetrically dis- 
tributed about a zero mean and that, ina sufficiently large set of measurements, 
the probability of the occurrence of an error between x and x + dx is, to a first 
approximation, of the form 


d(x) dx = a e 5 12σ᾽ dy (1.7.1) 


J2n 6 


where o is a constant parameter, to be adjusted to the observations. The 
function ¢ is called the frequency function of the distribution. The probability 
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that an error not exceed x algebraically is then given by the normal distribution 
function 


O(x) = [ φ() dt = i "9-120? gy (1.7.2) 
Jin 


the numerical coefficient in (1.7.1) having been determined so that ®(co) = 1 


| ous | "ge τ (1.7.3) 


1 
/ 21 6 
in accordance with the requirement of unit probability that any error lie some- 
where in (— 00, οὐ). 
Further, the probability P(x) that an error chosen at random lie between 
—|x| and + |x|, that is, that its magnitude not exceed |x|, is clearly given by 


|x| 
P(x) = |x!) — O(-[x) = | gt dt =2 [ φ( αἱ 


— [>| 
or 
v2 
Jno 
whereas the probability that it exceed |x| in magnitude is Q(x) = 1 — P(x). 
Equation (1.7.4) can also be written in the form 


xjV20 .᾿ 
P(x) = τ Ι POO ge as et 4 (1.7.5) 


in terms of the error function (see Prob. 12). 

Details must be omitted here with respect to the wide class of situations 
in which the use of this so-called normal-distribution \aw is justifiable, but the 
literature on this subject is extensive (for example, see Feller [1968]). In 
particular, even though the frequency distribution of the errors in a single quantity 
may not be capable of good approximation by a normal frequency distribution, 
of the form specified by (1.7.1), it generally is true that, when many such in- 
dependent component errors are compounded, the resultant distribution can 
be so approximated. 

The eae εἰ o is called the standard deviation of the distribution and 
its square o” is called the variance. It is easily seen that the points of inflection 
of the curve representing $(e) lie at distance o on each side of the maximum 


|x| 
P(x) = | e 1125 de (1.7.4) 
0 


at e = 0. The parameter ἢ = 1 WN 20 2π a) is called the modulus of precision and 
is a measure of the steepness of the frequency curve near its peak at the origin. 
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If ¢ is a random variable, the mean value (or expected value) of any function 
g(e), relative to the assumed distribution, is given by 


00 1 00 es 
(G0) mean = | w(eg(e) de = - ὦ | e°G(e) de (1176) 
= οὦ J tO J-—@ 
under the assumption that this integral exists. In particular, since ¢(e) is an 
even function of ε, we verify directly that the mean value of ¢ itself then is indeed 
zero, and we find also, for example, that 


0 


eee ee [ ep(e) de = iF σ (1.7.7) 
π 
and 


oO 

(87) mean = 2 | εἶ φ(ε) de = a” (1.7.8) 
0 

Mean values of higher powers of [6] can be expressed similarly, in terms of the 

parameter o. 

Thus, this parameter could be determined in such a way that any one of 
these “moments,” thus calculated for an assumed normal distribution, is made 
to equal the corresponding moment of the distribution actually under con- 
sideration, if that moment could be calculated or approximated. It happens 
that the choice of the second moment leads to the most convenient analysis and 
also is recommended by certain theoretical considerations. Thus we specify 
the parameter o of the approximating normal distribution (1.7.1) in such a way 
that it is equal to the square root of the mean of the squared errors in the true 
distribution, 


=u (1179) 


In general, the root-mean-square error ἕμμς for the entire distribution can be 
estimated only from a sample of, say, the deviations of kK measurements from 
their mean value, and an appropriate estimate is then afforded by the formulat 


ξεμς = aE (ef + ef +++: - ε (1.7.10) 


Having obtained such an approximation to o, one can make use of 
Fq. (1.7.4) to estimate the probability that the magnitude of a random error 


+ A theoretically better estimate, which tends to take into account the probable 
deviation of the mean of the observations from the unknown true mean, is obtained 
by replacing 1/k by 1/(k — 1) in (1.7.10). This modification is of practical significance 
only when k is relatively small, in which cases the validity of the statistical analysis 
itself may be open to question. 
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exceed (or not exceed) a certain specified amount. A few useful values of 1 — P 
are listed in Table 1.4. Thus, the probability of an error of magnitude greater 


Table 1.4 
é/Erms 1 — P(e) 
0.674 0.500 
0.842 0.400 
1.000 0.317 
1.036 0.300 
1.282 0.200 
1.645 0.100 
2.576 0.010 


than égys is 0.317. Only 20 percent of the errors should exceed 1.282égmg, 
10 percent should exceed 1.645égys, and 1 percent should exceed 2.576&gys if 
the distribution is sufficiently nearly normal. 

The number 0.67449c is often called the probable error of the distribution. 
It should be noticed that this is merely that number which should be exceeded 
by the magnitude of half the errors; it is in no sense the most probable error, 
as the name tends to suggest. 

If the approximation (1.7.10) were calculated for a large number of sets 
of samples, each containing k errors chosen at random from the same distribu- 
tion, and if the mean of the estimates were selected as the best approximation 
to the true gauss, the deviations of the various estimates from this best one would 
also be normally distributed, to a first approximation, with an RMS value of 


eats 2k, when k is sufficiently large. This fact is often useful in estimating the 
reliability of the estimated value of égys. 

Now suppose that ε is the sum of two independent errors u and v, each 
of which varies about a zero mean. Then the mean value of ¢” is the sum 
of the mean values of u”, 2uv, and v?. But, since μι and v are independent, the 
mean of uv is the product of the means of μ and v and hence is zero. Thus there 
follows 


Epms = ulus + trms (1.7.11) 


This argument generalizes to show that the RMS value of the sum of n independent 
errors (each having a zero mean) is the square root of the sum of the squares of 
the RMS values of the component errors. 

It can be shown (see Prob. 26) that the normal-distribution law has the 
property that, if μ and v are independent and normally distributed, with standard 
deviations o, and o,, thene = u + vis also normally distributed, with standard 


deviation σ = J o2 + o7. Thus if, in accordance with (1.7.9), we identify 
σι, With ugys and o, with vpms, it will follow also that o,,, = (u + V)pMs- 
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In illustration, if each of the numbers in the sum 
426.44 — 43.26 + 2.72 + 9.61 — 104.26 — 218.72 


represents the mean of a set of observations, and if the (approximate) RMS 
error associated with each is, say, 0.05, then the formal sum 72.53 would possess 


an RMS error of / 6 (0.05) = 0.12. Such a result is often recorded as 72.53 + 
0.12, although some writers use the probable error (0.674)(0.12) = 0.07, and 
write 72.53 + 0.07, while still others use the notation N + d to indicate that ὦ 
is the maximum error in N (which would be undefined in the present case). 
None of these three conventions will be used here. 

If we consider the error ¢ which arises from rounding a number to ἡ 
decimal places, it is clear that the distribution of values of ¢ will not be well 
approximated by any normal distribution, since here the frequency function 
has the constant value 1/(2le|,,..) when [6] < |e|,,,.. = 5 x 1075} and the value 
zero otherwise. However, the distribution function corresponding to errors 
which are (exactly or approximately) linear combinations of many such errors 
generally will be appropriate for approximation by ἃ normal-distribution 
function. Thus, in such cases, the error analysis may be based with some con- 
fidence upon the result of treating the individual errors as though they were 
normally distributed. (See Prob. 27.) 

For this purpose, we may notice that if x takes on all values between 
—4 and 4, and if all those values are equally likely, the RMS value of x is 


1/2 1/2 2 
( | x? ix) — 1,/3 = 0.2887 


1 - 1,2 


max 


Hence, if is roundoff error due to rounding to the nth decimal place, there follows 
Epms = 0.2887 x 10°" (1.7.12) 


Thus, if A numbers are each rounded to n decimal places, the error in the sum 
of the results can be considered to be normally distributed, with an RMS value 


of 0.2887./ k x 10°", if k is not too small. 

In particular, when 1,000 such numbers are added, the RMS error in the 
sum is less than 10 units in the nth place. According to Table 1.4, the probability 
of an error of 17 units is less than 0.1, and the odds are 99 to 1 that the error will 
not exceed 26 units. Nevertheless, an error of 500 units in the nth place is 
indeed possible. 

In accordance with such considerations, it is rather conventional to obtain 
a “‘realistic’’ estimate of the possible overall error due to k roundoffs, when k 
is fairly large, by replacing k by τι k in an expression for (or an estimate of) the 
maximum resultant error. 
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1.8 Recursive Computation 
Frequently it is convenient to evaluate a function recursively, that is, by use of a 
recurrence formula. For example, if the polynomial 


f@= D> Cp = Cn" + Cx tt + Cx + Cy (1.8.1) 
k=0 


were to be evaluated for a specific value of x by calculating the powers x’, 
x3,..., x" and forming their linear combination, a total of 2n — 1 multiplica- 
tions generally would be needed. On the other hand, if the calculation is based 
on the “‘nested”’ grouping 


f(x) = χαί +x, + Ga) + Guat} + Cy) + Co (1.8.2) 


it is seen that not more than ἢ multiplications are needed. 
The latter calculation can be systematized (for example) by use of the 
recurrence formula 


Up = χκει + C, (k =n,n —1,..., 0) (1.8.3) 
with 
Unt, = 9 (1.8.4) 
so that a sequence u,, U,—1,..-, Uo 15 determined recursively, 
y= Cy yg = XC + Cong the = XC, + 4+ Gr 


and so forth, and the term wy provides the desired evaluation 


Cx" = uo (1.8.5) 


iM 


k 


This process is often called Horner’s method (but is due to Newton) and is 
equivalent to the use of so-called synthetic division. (See Sec. 10.14, where 
C, = 1 and C, = a,-_,.) 

More generally, suppose that a sum of the form 


70) = > Gh) 1.8.6) 


is to be evaluated for a specific value of x, where @, itself satisfies a recurrence 
formula of the form 


Pxri + be + BreOy-1 = 0 (kK = 0, 1, 2,...) (1.8.7) 


in which ἀκ and/or f, depend upon x as well as k. (The preceding simple case, 
where ¢, = x“, is obtained by taking ¢_, = 0, φο = 1,. a, = —x, and 
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B, = 0, so that (1.8.7) then is a two-term formula.) If a sequence u,, u,-1,..-; 
Uy is defined by the associated recurrence formula 


Uy, + Ups, + Beriter2 = C, (k =n, n — 1,..., 0) (1.8.8) 
with 
Un+1 = Unt. = 9 (1.8.9) 
it follows that 


(Uj, + OU + Bus Ups 2)Py 


S Chu 


0 k 


> (Py + ακ-ιφκ-- + Be-1Op—2)Ux + Goto + (G1 + Ahoy 


l 
ἔν: 


k 


It 


Hence, since the coefficient of u, in the last sum vanishes for each relevant k, 
by virtue of (1.8.7), we find that 


n 


> CPx = Polo + (P1 + Mpho)u, (1.8.10) 


k=0 


This formula properly reduces to (1.8.5) in the preceding case. 
In the special cases when fy = 0 or @_, = 0 in (1.8.7), the sum reduces 


to hoo. 
As an example, for the purpose of evaluating the sum 


f(x) = Σ C,coskx (1.8.11) 
k=0 


we may note that ¢, = cos kx satisfies the relation 
Py ΠΩΣ 2φ, COS Χ + Dy 4 — 0 (k — 0, l, ty n) (1.8.12) 


with ¢_, = cosx and ¢) = 1. Hence, if u,,u,-1,..., Uo are determined 
recursively from the relation 


Uz, — 24, COSX + ty. = C, (kK=n,n—1,...,0) (14.8.13) 
with 
Un+i1 = Un+2 ΞΞ 0 (1.8.14) 


there follows 


C,.cos kx = up — u,cosx (1.8.15) 


iM 


k 
(Other important uses of the preceding result are indicated in Chaps. 7 and 9.) 


Although computation of this sort, based on a linear recurrence formula, 
often is a particularly efficient process, it should be noted that sometimes in 
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numerical work the propagation of roundoff errors is so unfavorable that the 
process fails. A well-known example is that in which use is made of the relation 


“4 1(Χ) = = I(x) — Jp-i(x) (48.16) 


for the purpose of computing values of the Bessel function /,(x) for specific 
values of x from known rounded values of Jo(x) and J,(x). When x = 1, if 
use is made of the five-place values 


Χο = 0.76520 4,0) = 0.44005 


the approximate values generated are listed in the following table, together with 
rounded true values: 


J,€1) 
k Approx Rounded 
2 0.11490 0.11490 
3 0.01955 0.01956 
4 0.00240 0.00248 
3 — 0.00035 0.00025 
6 — 0.00590 0.00002 


The apparent divergence might have been anticipated, assuming knowledge 
in advance of the fact that J,(1) rapidly tends to zero as k increases. Since, when 
= 1, the recurrence formula becomes 


σι..(1) = 2Κυ (1) -- σ- 6) (1.8.17) 


it follows that if roundoff errors of magnitude ¢ could occur in the approxima- 
tions to J,(1) and J,(1), then the corresponding absolute error in J,(1) could 
exceed 28, that error in J3(1) could exceed 4(2e) = 82, and, more generally, 
the magnitude of the error in J,4,(1) then could exceed 2"k! ε. Thus, since 
J,4(1) itself decreases rapidly in magnitude as k increases, a rapid growth in 
the relative error should have been feared. 

The source of the difficulty here also can be described as follows. Since 
the Bessel function Y,(x) of the second kind also satisfies a recurrence formula 
of form (1.8.16), it follows that the recurrence formula 


Puri = 2Κφ, — φι.. (1.8.18) 
is satisfied by 


φ, = AJ{1) + BY,1) (1.8.19) 


when k = 0,1, 2,..., where A and B here are determined by the specified 
values of φρ and ¢,. Unless the ratio of these two starting values is exactly 
the ratio of J,(1) and J,(1), the value of B will not vanish. Hence, since Y,(1) 
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increases rapidly as k increases, the “parasitic” term BY,(1) eventually will 
dominate the desired term AJ;,(1). . 
A third equivalent appraisal of the situation would result from writing 


0, = φκει (1.8.20) 
x 
in (1.8.18) to obtain the new recurrence formula 


1 


O,—4 


θ, = 2k — (1.8.21) 


This relation suggests that when k is large, there follows 
θὲ — 2kO, +120 (1.8.22) 
and hencet 
0,.~k+Vk2-1 (k>0) (1.8.23) 


Thus one solution of (1.8.18), namely, Y,(1), grows as k increases in such a way 
that Y,.,C1) ~ 2kY,(1); the second solution, namely, the desired one J;,(1), 
tends to zero as k -- οὐ in such a way that J,4.,(1) ~ (2k)~*'J,(1). 

One method of overcoming this difficulty consists of applying the formula 
(1.8.17) in the opposite direction, with unknown values of J,(1) and J,,.,(1) 
carried as literal parameters to be determined so that the values of J,(1) and 
Jo(1) so generated agree appropriately with their known true values, since the 
parasitic solution BY,(1) then rapidly damps out as the recursion proceeds. 

A somewhat simpler procedure, for numerical work, chooses N so large 
that J,(1) certainly is smaller than 5 x 10~"'~', where r is the required number 
of significant digits in J,(1), then starts the backward recursion with Jy(1) = 0, 
Jy—,(1) = A and determines A such that a generated value [say, J,(1)] agrees 
suitably with a known value. Here, in fact, one can merely assign a convenient 
fictitious value (say, A = 10°") to Jy_,(1) and scale up the generated results 
of interest in the proper ratio at the end of the process. 


1.9 Mathematical Preliminaries 


In this section, we list certain analytical results to which reference occasionally 
will be made in the sequel. Proofs of most are omitted. First, it is noted that 
in most of the following chapters it is supposed that all functions dealt with are 
real and continuous in the range considered and, in addition, that they possess 
as many continuous derivatives as the analysis may require. 


} The notation u, ~ v, (k — 00) is used to indicate that limy., ὦ (v,/u,) = 1. 
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The basic fact that a function f(x) which is continuous fora < x Ξ ὃ takes 
on each value between f(a) and f(b) is intuitively ‘“‘obvious” but is capable of 
rigorous proof. Two immediate consequences of this result are the following: 


Theorem 1 If f(x) is continuous for a S x S ὃ, and if f(a) and f(5) 
are of opposite sign, then f(¢) = 0 for at least one number ¢ such that 
a<é€<b. 


Theorem 2 If f(x) is continuous for a < x S 5, and if A, and A, are 
positive constants, then 1, f(a) + 4,/f(b) = (A, + 42)f(€) for at least one 
ὅ such thata Ξ € Ξ ὃ. 


If also f’(x) exists, two additional results can be established: 

Theorem 3 If f(x) is continuous for a S$ x < ὃ and f'(x) exists for 
a<x <b, and if f(a) = f(b) = 0, then /’(¢) = 0 for at least one ¢ 
such that a < € < ὃ. (This is Rolle’s theorem.) 

Theorem 4 If f(x) is continuous for a S x S ὁ and f"(x) exists for 
a< x <b, then f(b) — f(a) = (Ὁ — a)f'(€) for at least one ξ such that 


a< & <b. (This is the mean-value theorem for the derivative.) 


In the following statements, it is assumed that the integrals involved exist 
and that b > a. 


Theorem 5 If |f(x)| < M fora < x Ξ b, where M is a constant, then 


< | "f(x dx < M(b — a) 


a 


| “Τῷ dx 


Theorem 6 If f(x) is continuous for a S x S 5, then 
b 
| f(x) dx = ὦ -- af 
for at least one € such that a < ἔ < b. (This is the first law of the mean.) 


Theorem 7 If m < f(x) < M and g(x) is nonnegative, fora < x Ξ ὁ, 
then 


lA 


m Ι are | ” Foog(x) dx <M | rere 


a 
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Theorem 8 If f(x) is continuous for a < x < b and g(x) does not 
change sign in [a, δ], then 


b b 
| f(x)g(x) dx £6 | g(x) dx 


for at least one such thata < ἔ < b. (This is the second law of the mean.) 


The four following theorems with relation to integrals involving a param- 
eter are of frequent use: 


Theorem 9 If a and ὁ are finite constants and F(x, s) is continuous 
in x and 5, then 


δ δ ὦ 


b b 
im ᾿ F(x, 5) ds =| F(c, s) ds 


Theorem 10 If a and ὦ are finite constants and if OF /Ox is continuous, 


then 
b b 
a F(x, 5) ἐς -Ξ OF (x, 5) ds 
dx |, me 0 


x 


Theorem 11 If ais a finite constant, u is a differentiable function of aa 
and dF /0x is continuous, then 


iz |, τα = [FE οἰ εὐ wo 
dx Ja a 


Theorem 12 If F(x) denotes the result of j integrating F(x) successively n 
times over [a, x], then 


F(x) = mee (x — s)""!F(s) ds 


The truth of each of these assertions, except perhaps for the last two, 
is nearly self-evident, and the details of their proofs are rather easily supplied 
once the preliminary basic properties of continuous functions are established. 

The validity of Theorem 11 follows from the fact that if we write 


I(x) = [ F(x, s) ds 
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there follows 
I(x +-Ax) — I(x) = [ [F(x + Ax, 5) — F(x, s)] ds 


u+Au 
Ἔ | F(x + Ax, 5) ds 


= ( Fé, 5) is Ax + F(x + Ax, n) Au 


a 


where é is between x and x + Ax and ἡ is between u and u + Au, by virtue of 
Theorems 4 and 6, and hence 


I'(x) 


l 


lim | F,(é,s) ds + lim F(x + Ax, ἢ) eu 
Ax70 J, Ax-0 Ax 


| F,(x, 5) ds + F(x, u) a 
P dx 


by virtue of the content of Theorem 9. 

Theorem 12 can be established by successive integration by parts, or 
verified by the use of Theorem 11, making use of the facts that, from the defini- 
tion, there follows Fi(x) = F,_,(x) and F,(a) = 0 for k = 1, 2,...,n, and 
Fi(x) = Fo(x) = F(x). 

This result is useful in deriving the finite Taylor series, with an error 
term expressed in a form which often is more useful than that given in (1.3.2), 
and also in deriving the form given there. For if we write 


F(x) = f* x) = 45} [09 (1.9.1) 


dxtt} 


and use the notation of Theorem 12, the results of integrating the equal mem- 
bers successively over [a, x] are seen to be 


F,(x) = f@) - f°@ 
F,(x) = f°" (x) — f°" °@ -- & — αὐ) 


Fax) = fx) — f°- (a) — αὶ -- af" (a) -- 6 - δ᾽ sora) 
and finally, after n + 1 integrations, 

---- 
F.40) =f) —f@ -@- af@ Ξ ΞΘ κῶς: 


2! 


_&= a! φῶ) (1.9.2) 
n! 
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Thus, after a transposition and a reference to Theorem 12, we deduce that 
if the (n + 1)th derivative of f(x) exists and is integrable over an interval including 
x = 4, then in that interval there follows 


f(x) = f(@) + fax - a) + τ ἔξει ee 


4 ΞΘ (x -- a)" + E(x) (1.9.3) 
where 


E(x) = ab (x -- s)"f“t1s) ds (1.9.4) 
n! |, 


Further, since (x — s)" does not change sign as s varies from a to x, we 
can invoke the second law of the mean (Theorem 8) to rewrite (1.9.4) in the form 


E(x) sian wad [ἃ — s)" ds 
n! ἃ 


Ξ , “2 (x — a)"*! (between a andx) (195) 
n : 


under the additional assumption that f@*(x) is continuous. Whereas the 
form (1.9.5) has the advantage of simplicity, the form (1.9.4) often is preferable 
because of the fact that it is explicit while (1.9.5) involves a parameter which is 
known only to lie between a and x. 

A useful generalization of the Taylor-series expansion (1.9.3) can be 
obtained by starting with the representation 


F(t) = F(O) + Σ οὐ + E (1.9.6) 


k n+1 n+1 
- {ΞΡ Ἐς εἶ d"*1F(t) (1.9.7) 
ΚΙ] dt* | _, (n+ 1)!] dt™*! |. 


with τ between 0 and t, and writing 


t=9(x)—- σὼ FR)=f(x) (1.9.8) 


where 


under the assumption that 


g(x) # 0 (1.9.9) 
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over some interval J including x = a, so that g(x) — g(a) increases or decreases 
steadily as x increases over 1. The result of this substitution takes the form 


"0 =f@ + Σ alae) - (@ +E (19.10) 


_1f 1 ay 
alla ' | 


_ [9% - σ(] }[{:1 415} 
- War [ΕΞ ΕΞ feo] (1.9.11) 


and where & lies between a and x, when x is in J, under the assumption that 
f@* D(x) and g“* (x) are continuous and g’(x) # Oin 1 

If we define a sequence of auxiliary functions a(x), «;(x),... by the 
recurrence formula 


= I)” Ge 919) 


g'(x) 
with 
Oo(x) = f(x) (1.9.13) 
it follows that 
= 1 = 1 -- n+1 
“=F a(a) Ε nab! tn+1(6)L g(x) — g(a)] (1.9.14) 


The expansion (1.9.10) is often known as a Biirmann series and is useful 
when a certain value of a function g(x) is known and the corresponding value 
of a second function f(x) is required. The special case when f(x) is identified 
with x itself is of most frequent occurrence. 

It can be shown (see Whittaker and Watson [1927]) that the coefficient 
c, can also be expressed by the formula 


1 αἰ 1 ; a k 
aoa [Ξ ᾿ ie Ἔ---: αὶ ie (1.9.15) 


Although this is the form usually given, the use of the form given in (1.9.11) 
or (1.9.14) often leads to a somewhat less involved calculation, particularly 
when (x — a) cannot be explicitly factored from g(x) — g(a). 

We conclude with a few useful basic facts relating to zeros of polynomials, 
recalling first the so-called fundamental theorem of elementary algebra, which 
states that any polynomialt other than a constant possesses at least one zero. 


+ The term polynomial is to be used in its common restricted sense to denote an expres- 
sion of the form agx" + a,x""1 +-++++ a,, where nis a nonnegative integer and 
the a’s are constants. 
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The usual proofs depend upon results established in the theory of analytic 
functions of a complex variable. Elementary treatments assume the truth of 
this theorem and deduce easily that any polynomial of degree n possesses exactly 
n zeros, with the understanding that the zeros may be real or imaginary,} and 
with the convention that repeated zeros are to be counted a number of times 
equal to their multiplicities. In this connection, « is said to be a zero of mul- 
tiplicity m if (x — αὐ" is a factor of the polynomial but (x — «)"*! is not. 

It is now supposed that the polynomial is monic (that is, that the leading 
coefficient is 1) and that the coefficients are real, in which case the following 
theorem combines four classical results. 


Theorem 13 Let p(x) = x" + αἰχη 1 4--. 4 a,, Where the a’s are 
real, and let the zeros of P(x) be denoted by x,, x,,... » Χρ» With the 
understanding that a zero of multiplicity m is to be assigned m different 
subscripts. Then the following statements are valid: 


DESCARTES’ RULE The number of positive real zeros either equals 
the number of sign changes in the coefficient sequence 


1, a,, Qy,..., 4, 


or is smaller by an even integer, whereas the number of negative real 
zeros is related in the same way to the coefficient sequence associated with 


p(—Xx). 


BUDAN’S RULE If N, and N, denote the number of sign changes in 
the respective sequences 


PQ), p'(a), p"(a),..., p™(a) 
and 


P(4), p'(b), p"(b), ..., p™(b) 


then the number of real zeros in the interval a < x < b either equals 
N, — N, or is smaller by an even integer. 


NEWTON’S PRODUCT-SUM IDENTITIES If q, denotes the sum of all 
possible products of r zeros with distinct subscripts, so that g, = x, + 
Χ, ΠΤ + Χ,» Jz = Χιχχ + XyX3 +++, and so forth, then 


4, = (—1)'a, (r = 1, 2,...,n) 


ΤΑ complex number a + ib, with a and b real, will be said to be imaginary (or nonreal) 
if b # 0 and to be pure imaginary if also a = 0. 
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NEWTON’S POWER-SUM IDENTITIES If s, denotes the sum of the rth 
powers of the zeros 
δ, ΞΞ χ' teeter Χῇ 
then 
5, + 8,44, Ἔ: + 53a, + ra, = 0 (ry = 1,2,..., 5) 
and 
5, + S44, ἘΠ + Spout, = O (r=n+i1,n + 2,...) 
The rules of Descartes and Budan (proofs omitted) are occasionally 
useful for the purposes of roughly locating zeros of a polynomial or of establish- 
ing their absence in a certain interval. There exists an extensive class of more 
powerful (and less elementary) theorems for such purposes (see Marden [1966] 
and Householder [1970]). 


The well-known product-sum identities are easily established by exploiting 
the equivalence 


(x — χὰ — χα) Ὁ — %) = x" + axst+er++a, (1.9.16) 


In order to derive the power-sum identities, we may notice first that 


PO) τς oe (1.9.17) 
p(x) Xx -- X Xx — Xz χ -- χ, 
and that 
eye. (9.48) 


X— xX; r=0x"* 
when |x| is sufficiently large. Hence substitution yields the equation 


we) (ἢ + Shy 5. gp eee pS tee) τῷ (1.9.19) 
Xx x x x 


from which the desired relations are obtained by equating coefficients of like 
powers of x in the expansions 


a ns Sn 
(x" + a,x" 1 π᾿ aS tot Bs ++) 


= nx"! 4 (n — 1)ayx" 2 τ + a,-1 (4.9.20) 


1.10 Supplementary References 


The Bibliography (Appendix B) lists some of the existing general texts on 
numerical analysis, together with a selection of collateral texts, journal references, 
and sources of tables. For historical references to early developments and 
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contributions, see Whittaker and Robinson [1944], No6rlund [1954], and 
Kopal [1961]. Todd [1962] presents a concise critical summary of many 
aspects of “classical” numerical analysis. For interesting examples of Challenges 
and pitfalls, see Stegun and Abramowitz [1956] and Forsythe [1958]. 

The fundamental existence proof of Weierstrass [1885] on uniform 
polynomial approximation was supplemented by a constructive proof by 
Bernstein [1912a], which actually displayed a qualifying polynomial for each 
specified error tolerance. For references to the more modern theory of ap- 
proximation, see Sec. 9.19. 

The logic of computer arithmetic is treated in references such as Flores 
[1960, 1963]. Numerous texts which combine treatments of theoretical aspects 
of numerical analysis with accounts of computer programming and related 
topics, in varying ratios, include Moursund and Duris [1967], Pennington 
[1970], and Arden and Astill [1970]. 

The classical paper on the effects of roundoff errors in large-scale numerical 
computation is Rademacher [1948]. For later developments and accounts, 
see chapters in Hamming [1962] and Henrici [1964] as well as the more com- 
prehensive treatments in Wilkinson [1964] and Rall [1965]. (The last two 
references also contain extensive bibliographies.) 

Feller [1968] and Cramér [1946] are standard references for topics in 
probability and statistics, and Burnside and Panton [1935] is good for classical 
results in the theory of equations. 


PROBLEMS 
Section 1.2 


1 Determine 40, A,, and A, such that the function W(x) = 40 + Ayx + A,x? and 
the function f(x) = 1/(1 + x) have each of the following sets of properties in 
common: 

(a) f(0), £4), fC) 

(ὁ) f(0), f(O), FO) 
(c) £4), f’A) f'A) 
(d) f’(0), £4), f’A) 


1 1 1 
(e) i f(x) dx, [ xf(x) dx, [ x7f(x) dx 
0 ο 0 
2 Calculate three-place values of the function f(x) = 1/(1 + x) and each of the 
parabolic approximations obtained in Prob. 1 at a spacing of 0.1 over [0, 1], 
and plot curves representing the errors in each approximation on a common 
graph. 
3 Proceed as in Prob. 1 (when this is possible) with the approximation y(x) = 
Bo + B, cos 2xx + B, sin 2πχ. 
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4 Determine that member y(x) of the set of all linear functions which best approx- 
imates the function f(x) = x? over [0, 1] in the sense that each of the following 
quantities is minimized: 


1 
(a) | f(x) — yx) P ἀν 
0 


(b) [2(0) — yO)? + [fG) -- yPHP + FC -- γι) ]} 
(c) ΠΡ ΟἹ) — y(x)| 


1 
(d) | χα — Χο) — »Ο)]} dx 
0 
5 Determine c,, c2, and 65 in such a way that the formula 


1 
| Ode -- οι 7(--1) Ὁ GIO) Ὁ 6570) 


-1 


yields an exact result when f(x) is 1, x, x”, and x°, and hence also when f(x) is any 
linear combination of those functions, for each of the following weighting func- 
tions: 


(a) w(x) = 1; (ὁ) w(x) = VI — x7; (c) w(x) = 


= ee 
V1 — x? 
Section 1.3 


6 1εὶ 5 -- μοῊ tu, Ἔ Ὁ uy + R, fork = 0,1,.... By noticing that 
U, + Ry, = Κι, Unya + Ravi = Κι 


deduce that if R, and Κὶ,.... have opposite signs, then R,, is smaller than u,, in mag- 
nitude, and is of opposite sign, whereas if also R, and R,4.; have opposite signs, then 
ΚΕ, is also smaller than uj, in magnitude, and is of the same sign. (This is often 
known as Steffensen’s error test.) 

7 Let δὲ = 09 — σι + Ὁ, — +++ + (1 ἴυκ-. for k = 1,..., where all v’s are 
positive. Assume also that v,,, < v, for all k, and that v, > Oask — οὐ. Show 
that S, is positive and increasing with k, but that S,, cannot exceed vy. Hence 
deduce that S,, tends to a limit as k — οὐ. Show also that Sp, 41 tends to the 
same limit, and hence that the series δὸ (—1)*v, then converges to a limit S. 
Finally, show that the truncation error R, is of the same sign as the first neglect- 
ed term and is smaller than that term in magnitude. (Notice that any finite 
number of terms not satisfying the stated requirements may be added to the 
series initially, without impairing its convergence.) 

8 Suppose that the alternating series 


S = v9 — 0y + ὕχ ποι" = > (-D*y 


10 


I] 


12 
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converges. Show that the series 
ie @) 
39 + 4(¥p9 — 04) -- $(V, — V2) + +++ = 4g + ἘΣ (-- 1 — ὕκ.ᾳ. 1) 


converges to the same sum. 
Use the transformation of Prob. 8 repeatedly to show that 


S=1-44+4-14...= ( 
eee ie 


k=0 
ST MS gg UO ne I 
=2*2 2.05 γα τῇ 8 +2 Deak ETD 


3 (-1)! 
"4, a (( + 1)(K + 2)(K + 3k + 4) 


41 


Show that the retention of five terms in the last sum given ensures that 0.69306 < 
S < 0.69330 or that S ~ 0.69318 with a maximum error of +12 units in the 
place of the fifth digit. About how many terms of the original series would be 


needed to ensure this accuracy? (The true value is S = log 2 = 0.69315.) 


If f(x) is a positive decreasing function of x, and if f x £(x) dx exists for some K, 


show that ΣῪ ΜΚ) converges. Show also that 
{ f(x) dx < > fk) < Ϊ sO) ἃς 
Κ k=K K-1 


How many terms of the series 


00 
ck 


would be required to determine the sum to four digits? 
By making appropriate use of the known results 


evaluate the sum 


Ait 


fz a oe + τ᾿ τ] 
correctly to four digits. 


The error function is defined by the relation 
x 
erf x = τ οὖ dt 
Vn 
It is known that erf x > 1 as x ~ 00. With the definitions 


A) = | εἴ dt Fa) = | ε΄ τ dt 


0 
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there follows 


erfx = — F,(x) = 1 -- —= 1.) 
+ τ 
Show that 
Ὁ (—1)'x2*+1 


F,(x) = 
10) (2k + 1)k! 
k=0 


where the series converges for all x. About how many terms would be required for 
five-digit accuracy when x = 0.2, 1, and 2? 
13 With the notation of Prob. 12, make use of repeated integration by parts to show 


that 
ΓΤ ee ee ee oe 2 x2n-1 
3 3. 5 3 5 2n — 1 
+ 22 2 ὦ e t2,2n oy 
3 5 2n — 1 0 
and hence that 
F,(x) = εὖ re ae ee 
3 3 5 


2. 2 
Ἐ{Ξ3-3--- —~_} x21) + Ex 
(:- - τι)" ἢ τ 


x 
13-5 f= 4) | 


Show also that E,(x) is smaller in magnitude than the term following the last one 
retained in the coefficient of e~*’, and is of the same sign, that the relative error 
cannot exceed (2x)?"n!/(2n)!, and that the infinite series obtained when n > 00 
converges for all x 

14 With the notation of Prob. 12, show that 


e F(x) = [ (erry 


where 


and, after successive integrations by parts (each followed by multiplication and 
division by ¢ in the integrand), deduce that 


11,131 
ee 
2x3 2 2x5 


n(1.3..2n-1\ 1 
+o"; aes mae Pe 


15 


16 
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and hence that 


— x2 
erf x = eee Poii 13: 
ache 20° 2.2. χ᾽ 
1 3 2n — 1 1 
Se {-π|ὴ)Ὴ ass + E(x 
i : 15:2} 6) 
where 
2 (1 3 2n + 1 ~ _» at 
E(x) = (-—1)"—=[{--=::: gor οι 
(x) = ( fa (5 1 ) πα 


Show also that the series is divergent but asymptotic (in the strict sense) and that 
the truncation error due to neglect of E,(x) is smaller than the last term retained 
and of opposite sign. Obtain the best possible approximation when x = 2, and 
give numerical bounds on the error. 

By making use of the known expansions of e*, cos x, sin x, and (1 — x)!/? in 
powers of x, obtain the coefficients of powers of x through the fourth in the 
corresponding expansions of the following functions: 


(a) e* cos x; (b) —— ; (c) (cos x)*/?; (d) ὁ Ὁ» 
cos x 


Under the assumption that a given series 
Y= Ayo + ax + anx* +--- (a, # 0) 
converges for sufficiently small values of x, and that x can be expanded in a series 


of powers of (y — @o)/a, which converges for y sufficiently near to ἀρ, in the form 


x= u+ Ayu* + Azur - ... («-2=%) 
ay 


show that the leading coefficients in the inverted series can be determined from the 
relations 


a,A, = -- α; 
a,A, = —2a,A2 — 43, 
a,A4 = —a,(As Ἔ 243) = 3442 — ας 


ost ev 9 5 ὁ ὁ ὃ 9 δ 9 ὁ ὃ ὃ 99 δ ὃ 9 ὁ νυν ὁ ὃ ὁ Ο ὁ ὃ ὁ ὁ δ α ὁ ὁ ὃ 64 


Show also that the first " terms of the inverted series can be obtained by a sequence 
of m — 1 substitutions in the right-hand member of the relation 


αι ay 


starting with x“ = u, and retaining only powers of u not exceeding the (r + 1)th 
in the rth substitution. Illustrate both methods in obtaining the first four terms 
in the result of inverting the series e* = 1 + x + 4x2 +--. 
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17 It is required to determine the symmetrically placed pair of nonzero roots of the 


equation 
sinh x = cx 
where c is a real constant such that c > 1. Show that, with the abbreviations 


s = 6(c — 1), t = x’, the problem can be considered as that of inverting the 
series 


and deduce the expansion 


x? 


1.2 2.3 13. .4 
S— goS° + 3355" — 378005 ὙΠ᾿ 


Sections 1.4 and 1.5 


18 


19 


20 


21 


22 


23 


Show that the number (2.46)1/°* is known within less than one unit in the place 
of its fifth significant digit if 2.46 is known only to be correctly rounded to three 
digits. 

Using only five-place tables of sin x and cos x, determine cos 0.10 — cos 0.12 
and tan 0.12 — tan 0.10 to four significant figures. 

Values of cos x are calculated from a five-place table of sin x by use of the formula 
cos x = (1 — sin? x)!/?. What can be said about the accuracy of the calculated 
values? 

If all coefficients in the definition 


5.03241x + 0.11095 


YS a 
μὰ 0.75995x + 0.014915 


are rounded numbers, to how many significant figures is f(x) determinate when x 
is known only to round to 3.26? 

If f(x) = (sinh x — sin x)/(cosh x — cos x), determine f(0.1) to 10 significant 
figures. 

Determine bounds on the degree of indeterminacy of each of the quantities 
tan~! 4.017216, sin~1 0.986423, cos 18.4178, and cos 18417.8, under the assump- 
tion that the arguments are rounded values. To how many significant figures are 
the last two quantities determinate? 


Section 1.6 


24 


Suppose that calculations are to be made in four-digit floating-point arithmetic, 
assuming a double-precision accumulator, but supposing that the computer 
rounds the number resulting from each operation (addition, multiplication, etc.) 
to four digits before effecting a subsequent operation on that number. If 


x, = 0.1234 x 103 χ; = 0.3456 x 10? x3 = 0.5678 x 10? 
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are exact numbers, evaluate the results of each of the following machine operations 
and, in each case, determine the absolute and relative errors associated with the 
result. 


(a) (x; Φ X2) Ὁ x3 

(ὁ) (x3 Ὁ x2) Φ x1 

(ὦ (x1 © x2) © x3 

(4) (x3 © X>) ΘΟ ΧΙ, 

(0) x1 O (2 Φ x3) 

(f) αι, © x2) ® (x; Φ x3) 

(9) (X1 © x3) ὦ x2 

(h) x1 © (3 DO x) 

(ἢ {[(%1 ® x2) © x2] © x1} © 0.1000 x 10! 
Οὐ) {[1 ® x2) © x1] © x2} © 0.1000 x 10] 
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25 


26 


27 


If f,(x) and μα) are the frequency functions of δε, and &, respectively, where 
δι and é2 are independent random variables, show that the distribution function of 
δ + δ) 1S 


[[ ΛφλμῸ) ds ae = [| f flu -- Of{0) α de 
s+t<x — 0 -“-οὦ 


and hence that the frequency function of δ: + ¢ is 
fa) = | λα — df(t) dt 


Use the result of Prob. 25 to show that, if δ, and ¢, are independent and are 
normally distributed about zero means, with standard deviations σι and o,, then 
δι + & is also normally distributed about a zero mean, with standard deviation 
σ = (of + 03)'/*. [Determine constants 4,, 12, and a such that 

(x — t)? ‘ ι x? (t — ax)’ 


ot σξ At 


A} 
and set 2 — ax = ν᾿ 2 A2,v, making use of the fact that 
ed — 
{ eo” dy = Vx 
— οὐ 


in evaluating the integral defining the required frequency function. ] 
Suppose that ¢,, €2,...,&, are independent random variables with a common 
uniform frequency function 


1 (-48*xs}4) 


0 (otherwise) 


fia) = | 
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29 


30 
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and denote the frequency function of e, + 82 + τ + &, by f,(x). Use the result 
of Prob. 25 to show that 


oa) x+1/2 
ΕΣ [ Ale — κι dt = { f(t) αἱ 


x—-1/2 
In particular, deduce that /,(x) is a triangular function 


1+x (-1sxs0) 


foo = 1 τὰ (0s x Ξ 1) 
0 (otherwise) 


and that /,(x) is defined by the relations 


4(3 mis x)? (-2 =xs —4) 
eee (-485 x8} 
f3(x) = 1(Κ — x)? Gs xs 3) 
0 (otherwise) 


Finally, plot each of the functions ἢ, 4, and f,, and compare it graphically with 
the frequency function corresponding to the normal distribution which has the 
same standard deviation o, = ν nf12 (n = 1, 2, 3). 

If the coefficients of the polynomial 


fe) = > at 
k=0 


are independently subject to random error distributions with mean value zero 
and with a common RMS value days, Whereas x is subject to an error distribution 
with RMS value ypys, show that the corresponding RMS error égys in f(x) 15 
given approximately by 


ent 2 


- 1 ; 
éams = Ξ a as dams + [7 (x) Pngms 


Use the result of Prob. 28 to estimate the RMS error in the calculated value of 
f(x) = 1.47x3 - 2.48x? + 2.21x — 1.65 


when x = 2.03, under the assumption that the values of x and the coefficients are 
known only to be rounded correctly to the three digits given. Within what limits 
is f(x) actually determinate in this case? Within what limits does its value lie with 
probability of about 0.9? 

If x1, X2,.-.., X, are each rounded to n decimal places, show that the correspond- 
ing RMS error in f(x,, X2,..., Χρ) is approximated by 


r 27 1/2 
pms © (0.29 x 107") |> (54) | 
k 


k=1 
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Show also that if 


ge | 
= OX, 


and if r is not too small (say r > 3), then the odds are about 10 to 1 that the error 
in f does not exceed K units in the nth decimal place. 
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31 


32 


33 


34 


35 


Evaluate the function 


: cos kx 
XxX — 
70) Κ- 


k=O 


to five decimal places (2) when x = 7/3 and (δὴ) when cos x = 0.2. 
The kth Legendre polynomial P,(x) satisfies the recurrence formula 


2k + 1 

Pia (x) -- xPi,(x) + P,-4(x) = 0 

+ 1(X) iar x(x) ἔπ %—1(X) 

with Po(x) = 1 and P,(x) = x. Evaluate the function 
10 
(—1)* 

x)= -------- P(x 
f(x) Der (x) 


to five decimal places when x = 0.102. 

Determine how many decimal places should be retained in order to evaluate 
P;(0.102) safely to five decimal places by use of the recurrence formula of Prob. 
32; then make the calculation. Finally, obtain P(x) analytically and check your 
result. 

Use a method of backward recursion to determine J5(1) to five significant figures, 
given that J,(1) = 0.7651977. (The true value rounds to 0.000249758.) 

Indicate why the use of (1.8.21) with (1.8.20) should not be expected to be ap- 
propriate for the numerical approximation of J,G1) when n is fairly large (by 
forward recursion), and verify this fact numerically when n = 5, retaining only 
five decimal places. [Show that (1.4.19) can be used to explain the unfavorable 


_ error propagation. ] 


Section 1.97 


36 If [a,b] = [~1, 1], show that the conclusions of Theorems 1 and 2 do not hold 


for f(x) = 1/x, that those of Theorems 3 and 4 do not hold for Καὶ (x) = 1 — x2/3, 
and that those of Theorems 7 and 8 do not hold for F(x) = x and g(x) = x3. 
Account for each of these situations. 


} The truth of the theorems stated in Sec. 1.9 may be assumed in the following problems. 
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37 Assuming the fact that 


38 


39 


40 


41 


42 


[ Sta=3 
ar 2 
show that 
π 
--- x <0 
; ( ) 
wm ,.," 
[, “=«- 0 (x = 0) 
0 5 
π 
= x>0d0 
᾿ ( ) 


Τ hus show that the conclusions of Theorems 9 and 10 do not hold for 


sin xs 


F(x, s) = 


-when a = Oandb = o. 
If f(x) vanishes at n + 1 distinct points in the interval a < x Ξ ὁ, and if f(x) 


exists for a < x < b, show that f(x) vanishes at least once in (a, δ). 
Ifa, > Oforr = 1, 2,..., , show that 
n 


a, 510 1 + a, sin2t +--+ + a,sin nt = βίῃ θὲ > a, 


r=1 


for some @ such that 1 < 0 < n. 


Show that 
© at 1 
< —— x >0 
[ t*+1 3x3 
and that 
x 2 —t2 dt 3 3 
t*e * ——— < 1]ορί + x”) < 4x (x > 0) 
0 ae we 
Show that 


1 
: [ (1 -- x2) f(x) dx = $f 
-1 


for some € in (— 1, 1) if f(x) is continuous in that interval. Also determine ξ when 


f(x) = x’. 
If F(k) is defined by the integral 


ra) = [ π΄ -- τὰ (k = 2) 
; 


use the second law of the mean to show that 


F(k) = (1+ & 1. ἘΞ: €)(k -- 2 - Crt — ξ) 


ἘΠῚ (0 «ξ <1) 
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and deduce that 
1 


1 
_1)k+1 i 
6k(k — 1) i οὐ λό νὰ 6k 


If g(x) is continuous and f(x) possesses a continuous derivative, and if 
x 
o(x) = Ϊ f(x -- t)g(t) αἱ 
0 


obtain an expression for d¢/dx. By making an appropriate change of variables in 
the definition of ¢(x), obtain an alternative expression for d¢/dx when 186 hy- 
potheses regarding f and g are interchanged. 

If γ΄ = 2 sin y + 12x? and γ(0) = γι) = 0, show that 


x 
y(x) = xt + 2 | (x — 5) sin y(s) ds 
o 
and deduce that y(x) lies between x* — x? and x* + x?. 
Determine the first three coefficients in the Biirmann series 
sin x = c,(e* — 1) + c,(e* — 1)? + c3(e* — 193 τ 


and use the result to determine approximately the value of sin x when e* = 1.012. 
If y = dg + ax + a,x? +--+ and if x can be expanded in a series of powers 
οἷ» — ap for y near ag, use the Biirmann expansion to show that the leading co- 
efficients in the expansion 


xX = €1(¥ — 60) + €2(y — do)? + €3(¥ — ao)? + 


are given by 


| a, 2az — a,a3 
CG =— %=-3 3, -Ξ- 55 13 
Qa, ay ay 


and verify that the results agree with those of Prob. 16. 
Derive the leading terms in the Biirmann expansion of x in powers of sin x in the 
form 


x = sinx + tsin? x + A sind x +--- 


and use the result to approximate βίη ἢ 4. 
Show that Budan’s rule reduces to Descartes’ rule when a = 0 and ὁ = οὐ. 
Then use this fact to show that either rule is deducible from the other. 
A polynomial p(x) with real coefficients has 2m consecutive vanishing coefficients. 
Show that at least 2mm zeros of p(x) either vanish or are nonreal. Also illustrate this 
conclusion when p(x) = x° + x?. 
Show that the polynomial 

P(x) = x* — 5x3 + 5x2 + 5x -- 5 


has one negative zero, one zero in (0, 1), either no zeros or two zeros in (2, 3), 
and no other real zeros. 
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Determine for what real values of c it is true that neither root of the equation 


x? — 201 — 2c)x +1=0 
exceeds unity in magnitude. 
Write out in detail the indicated derivation of Newton’s power-sum identities. 
Determine the sum of the rth powers of the zeros of the polynomial 


p(x) = x* — 2x? + 3x? — 4x 4+ 5 
for r = 1, 2, 3, 4, and 5. 


2 


INTERPOLATION WITH DIVIDED DIFFERENCES 


2.1 Introduction 


Anyone who has had occasion to consult tables of mathematical functions is 
familiar with the method of linear interpolation and probably has encountered 
situations in which this method of “treading between the lines of the table” has 
appeared to be unreliable. If more reliable interpolates are desired, it is clearly 
necessary to make use of more information than that consisting of tabulated 
values (ordinates) of a function, corresponding to only two successive abscissas. 
Whereas that additional information could consist, for example, of known 
values of certain derivatives of the function at those two points, it is supposed 
in most of what follows (an exception is found in Sec. 8.2) that the interpolation 
process is to be based only on tabulated values of the function itself, with any 
further available information reserved for use in estimating the error involved. 

There exist a number of interpolation formulas which have this property, 
most of which possess certain advantages in certain situations, but no one of 
which is preferable to all others in all respects. Whereas certain of these formulas 
are expressed explicitly in terms of all the ordinates on which they depend 
(Chap. 3), others involve only one or two of the ordinates explicitly and express 
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their dependence upon other ordinates only in terms of differences of ordinates 
and successive differences of differences. 

In the general case, when the abscissas are not necessarily equally spaced, 
the use of so-called divided differences affords certain conveniences. The 
principal purpose of this chapter is to define such differences and investigate 
certain of their properties, to obtain a basic interpolation formula due to 
Newton (Sec. 2.5), from which most of the other formulas of the type described 
can be deduced, and to obtain expressions for the error term (Sec. 2.6). Related 
methods of iterated linear interpolation (Sec. 2.7) and inverse interpolation 
(Sec. 2.8) are also treated. 


2.2 Linear Interpolation 


The assumption that a function f(x) is approximately Jinear, in a certain range, 
is equivalent to the assumption that the ratio 


F(x) Be I (Xo) (2.2.1) 

X1 — Xo 
is approximately independent of x) and x, in that range. This ratio is called 
the first divided difference of f(x), relative to x» and x,, and may be designated 


by f[Xo, X1]:T 
tf Xos x] = f(%1) -- f%o) (2.2.2) 
X; — Xo 


It is clear that f[x1, xo] = S[ Xo. *1]- 
Thus the assertion of approximate linearity may be expressed in the form 


71χο: x] od fl Xo: χα] (2.2.3) 


which leads to the interpolation formula 


f(x) κα f(x) + α — x0) f xo χα] (2.2.4) 
or 
f(x) © f%o) + = = ᾿ [f1) - ΧΟ] (2.42 
1,9 0 


or, equivalently, to the formula 


joa 


1 — Xo 


[(x, — x)f(%o) — Mo — x)f (x1)] (2.2.4) 
which can also be expressed in the convenient determinantal form 


ὡκ-- -ω aad IE 25) 


X1 — Xo F(X1) X, — XxX 


+ Various other notations are used, such as [xo, χα]. f(xo, X31), and (xo, Χμ). 
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It may be noticed that (2.2.4) involves one ordinate and a divided difference, 
(2.2.4’) involves one ordinate and an ordinary difference, and (2.2.4”) involves 
the two ordinates directly. 

It is convenient to designate the linear function defined by the right-hand 
member of (2.2.4) by Po,1(x), the subscripts corresponding to the ordinates 
used in its formation. For symmetry of notation, it is desirable to write also 


Pol) = f[X0] = f(%o) (226) 


so that [x] is defined as the zeroth divided difference relative to x9 and is 
merely the value of f(x) at x = χρ, and po(x) is the approximating polynomial 
of degree zero which agrees with f(x) at x = x9. With this notation, Eqs. 
(2.2.4) and (2.2.5) become 


I(x) © Po,(x) = f [Xo] + ὦ -- xo) f [χο» x1] (2.2.7) 


1 


Ρο,(Χ) = ------- 
X1 — Xo 


and 
P(X) χο --χ 


2.2.8 
Ρι(χ) Χιπχ , ᾿ 


These forms are given here principally to correspond to more general forms to 
be obtained in following sections. 

We see that the approximation f(x) ~ Po,1(x) is exact for all values of 
x if f(x) is indeed a linear function, of the form f(x) = 49 + A,x, and further 
that the approximation is exact at the points x = x, and x, for any function F(x). 

As a numerical example, the linear interpolation of sinh x for x = 0.23, 
from tabulated five-place values for x) = 0.20 and x, = 0.30, may be arranged 
as follows: i 


x; Sx) χι —-X 
0.20 0.20134 — 0.03 
0.30 0.30452 0.07 


(0.23) κ' CAD O MID — (0. 05)0. 30459) 


= 0.23229 


Since the true five-place value is 0.23203, it is seen that linear interpolation here 
affords only three-place accuracy. 

It is useful to notice that, since a linear interpolation merely effects a 
certain weighted average of the two ordinates involved, the result of an inter- 
polation involving two ordinates such as 13.6340 and 13.6393 can be considered 
as the sum of 13.6300 and the result of effecting the same interpolation on 40 
and 93, with this result added to 13.6300 in units of its last place. 

Further, since the numerator and denominator of the ratio (2.2.5) are 
homogeneous in the abscissas, the entries x, and x, — X in the computational 
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array may be multiplied by any convenient common factor. In particular, the 
x’s in the preceding table could be replaced by 20 and 30, and the entries in the 
last column correspondingly by —3 and 7. 

Unless f(x) is truly linear, the secant slope f[xo, x, ] will depend upon the 
abscissas x) and x,. However, if f(x) were a second-degree polynomial, the 
secant-slope function f[x,, x] would itself be a linear function of x, for fixed x,. 
That is, the ratio 

f[x1, x2] — f[%o, χα] 

x2 — Xo 
would be independent of xo, x;, and x2. This ratio is called the second divided 
difference, relative to those three abscissas, and is designated here by 
Ff (Xo, 1, X2]: 

{χο- ΧΙ: χ4]} Ξε f [x1 X2 | . {{χο- χα] (2.2.9) 

X2 — Xo 
In particular, since f[x1, Xo, x] = 7[Χο» ¥1, x] (see Sec. 2.3), the difference 
between the two members of (2.2.3) can be expressed as 


f[xo. x] — fL%o, x1] = 5 [χο, x] - f [x1 Xo] = & — xf L%o, *1, x] 
so that the approximation (2.2.4) can be replaced by the identity 


f(x) = f[ x0] + α — Xo) L%0. χα] + (% — χολα — x1) f[%0,%1, x] (22.10) 
Thus the error committed in (2.2.7), by replacing f(x) by po,:(x), is given 
by 
E(x) = f(x) — Pols) = (ἡ — Xo) -- Xi )f [Xo X15 X] (2.2.11) 
Whereas knowledge of f[x9, x1, X] is tantamount to knowledge of the exact 
interpolant f(x), the form (2.2.11) of the error is a special case of a more general 
form to be obtained, which (as will be shown) is frequently useful in obtaining 
an estimate of the error in an actual calculation. For any Jinear function f(x), 
the error term will indeed vanish identically, as may be verified directly. 
Before generalizing the result just obtained, it is desirable to define 
divided differences of all orders, and to investigate certain of their properties. 


2.3 Divided Differences 


Divided differences of orders 0, 1, 2,...,& are defined recursively by the 
relations 


S[xol =f(%o) Sfx x1] = fix] — flXo] 


Χι τ Xo 


7[Χχο: eee Xx] = {{χι.---. %)] — 7[χο»---» χκ- 1} (2.3.1) 


XxX, — Xo 
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We notice that the first kK — 1 arguments in the first term of the numerator 
are the same as the last kK — 1 arguments in the second term and that the 
denominator is the difference between those arguments which are not in com- 
mon to the two terms. It is clear from the definition that f [xo,..., %] is a 
linear combination of the k + 1 ordinates f (Xo), ---,/(x;,), with the coefficients 
depending upon the corresponding k + 1 abscissas. 

When k = 1, the divided difference obviously is a symmetric function 
of its arguments, that is, f[x,, x9] = f[%o, x,]. It is shown next that the same 
statement applies to divided differences of all orders. In order to establish this 
fact directly in the case of k = 2, we may write 


FLX, Xi x, | = fl%1, x2] — ἔ[χο;, x1] 


Χ) — Xo 
᾿Ξ 1 7.2) -- I (x4) ἊΣ S(x;) - I (Xo) 
X2 — Xo X2 — X14 X; — Xo 


and the result can be put in the symmetric form 


F (Xo) 
(Xo — χι)ίχο — 2) 
{( Ὁ) 4 I (X2) 


(χ, — Xo(x, -- X2)  (X2 — χρ) 2 -- X1) 


f (Xo, X1, X2] = 


This result suggests the truth of the more general relation 


_ I (Xo) 7. hes 
Flor +++» λα] ἢ (Xp -- χα)" Ὁ -- Xx) ᾿ (χ, -- χορ)" (αι -- ἜΝ 
+ IO) (2.3.2) 


(Xp — Xo) °° Oy -- X,~1) 


for any positive integer k, so that the coefficient of I (x;) 15 


Oe (§=0,1,...,K) (233) 


where the zero factor (x, — x,) is to be omitted in the denominator. 
In order to establish this conjecture by induction, suppose that it has been 
proved for k = r. If we recall the definition 


7[χο.. δ Xp+1] = — {f[x4,. ae Xp+1] ΠΣ 71χο-. og x,]} (2.3.4) 
0 


r+1 
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it then follows that, for i = 1, 2,..., 7, the coefficient of f(x,) in the right- 
hand member is given by 


1 1 i 

Xr+1 — Xo Ε — χορ) Ὁ — X41) (x; -- Χο) “ὦ; -- ο] 

en 

(xX; — Xo) °° Oi τ X41) 

in accordance with (2.3.3) with k = r+ 1. Wheni = 0 rr + 1, only one 

of the terms in the right-hand member of (2.3.4) involves the ordinate f(x), 

and the respective coefficient also is easily seen to be in accordance with (2.3.3) 

withk = r + 1. Thus, if (2.3.2) is valid fork = r, itis valid also fork = r + 1. 

Since it has been established for k = 1 (and k = 2), it is therefore valid for any 
positive integer k, as was to be shown. 

It follows, from the symmetry of (2.3.2), that the order of the arguments 
is irrelevant. Hence f[xo, ..., X,] can be expressed as the difference between 
two divided differences of order kK — 1, having any k — 1 of their Καὶ arguments 
in common, divided by the difference between those arguments which are not in 
common. For example, there follows 


= oft) (2.3.5) 


x3] = f[x1, x2, Χ3] — FLX, 1, 2] 


f Xo; X15 X2; 


X3 — Xo 
= 72[χο. X25 x3] 2. f [x1 X25 ΧΆ] ee 
Xo — *1 


In those cases when two or more arguments in a divided difference become 
coincident, recourse must be had to appropriate limiting processes. Thus, for 
example, if we set x; = x + 8, there follows 


flu, x] = f[x + ε, x] _ 2 τὸ - 70) 


δ 
and, in the limit when ε -- 0, we have 


fix, x] =f'@%) (36) 


if f(x) is differentiable. A similar argument shows that 

d 

Fy) Leo πες τῶ | HT (Kose τ ow Xo | (2.3.7) 
if Xo,..-, X, are constants. If u,, u2,...,U, are differentiable functions of 


x, there follows also 


d . du 
— f[Xo; eee g Xks 19 eee 9g un | = > f[xo: ae, | Xk “19 ...9 Uns u, | ἜΠΕΣΟΝ 
dx v=1 dx 
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and hence, by taking u, = --: = u, = x, we may deduce that 


n times n+1 times 


Ὁ fl%0 0s Mp eg 2 = nf[Xo.--.5 Xe Χ,ρ.... X] (2.3.8) 
x 


Finally, by successive differentiation of (2.3.7) combined with the use of (2.3.8) 
at each step, we may establish the additional useful formula 


«τ r+1 times 


7 [Χο ἀρ» Χ] ΞΕ ΤΥ [Nout Xe eco] (2.3.9) 
x 


In particular, we may deduce that the result of allowing r + 1 arguments of a 
divided difference to become coincident is finite if the rth derivative of f(x) is 
finite at the point of confluence. 

It is seen that f[ x9, ..., x;,, x] is continuous at x = ¥ if X is not identified 
with Xo, X1,..., or x,, and if f(x) is continuous at x. If /’(x) does not exist 
at Xo, the function f[xo,..., x,, x] generally will not tend to a finite limit as 
x + Xo. Thus, for example, if f(x) = /x, there follows f[0, x] = 1/,/x, 
and this function naturally becomes infinite as x — 0. 

However, since the product 


(x — Χο) Χο» - - -» % x] = PX) = (2.3.10) 


is identical with the difference 


IX Xia Xe) HT [0s Hcy. Xe] 


it follows that P(x) vanishes when x = x, for any function f(x) defined at 
Xo,--+» Xx, When these abscissas are distinct. Also P(x) tends to zero as x > Xp 
(and hence is continuous at x9) if f(x) is continuous at x9. Further, if x) andr 
of the other k abscissas are equal, it follows also that P(xo) = 0 if f(x,) 
exists and that P(x) > 0 as x — x, (and hence is continuous at x) if f(x) is 
continuous at Xp. 

It may be expected that the kth divided difference of a polynomial of 
degree n is a polynomial of degree n — k if k Ξ ἢ, and is identically zero if 
k >n. The proof follows easily from the fact that the first divided difference 
of x” 


x" — x,” 7 7 
oer ts |S = x™ te xX9x™ a ἐνὰ 


X — Xo 


is a polynomial of degree m — 1 in x, when mis a positive integer. In particular, 
it is seen that 


flxo%--,%)=1 iff) =x" (2.3.11) 
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2.4 Second-degree Interpolation 


If the accuracy afforded by a linear interpolation is inadequate, a generally 
more accurate result may be based upon the supposition that f(x) may be 
approximated by a polynomial of second degree near the abscissa of the inter- 
polate. This is equivalent to assuming that, within a certain prescribed tolerance, 
the first divided difference f[x, x9] is a linear function of x for fixed xq or, 
equivalently, that the second divided difference f[x, xo, x,] is constant. The 
hypothesis 


f Lx, Xo» x1] ~ fl; Χο» x1] = f[Xo, X15 X2 | (2.4.1) 


then takes the form 


ΤΙΣ; Xo] =. f Xo: χα] ΡΨ f Xo; 


X — X41 


X45 χα] 


or, after another reduction, 


F(X) © Pos.) = 7) [χο] + & -- Xo)fL%o, χα] 
+ (x — χρ)α — x) f LX, Χι» X2] (2.4.2) 


Since the difference between the two members of (2.4.1) is expressible as 
(x — x,)f[xo, x1, X2, x], the error in the approximation (2.4.2) is given by 


E(x) = (α — χολὰ — χε) — X%2)f[%0, X14, X2, X] (2.4.3) 


From this result, we may deduce that E(x) = 0 if f(x) is a polynomial of 
degree 2 or less, and that E = 0 when x = Xo, X,, or x, for any function f(x) 
which is defined for those arguments. Thus po ; 2(x) is a polynomial of degree 2 
which agrees with f(x) when x = Xo, x,, and Xp. 

In order to make use of (2.4.2), one may first form a divided-difference 
table as follows: 


a | f ol 
oe αι] 

ay ee. f [49,4 1,42] 
wer > a| 

a2 | f(a) 


Here each entry is given by the difference between diagonally adjacent entries 
to its left divided by the difference between the abscissas corresponding to the 
ordinates intercepted by the diagonals passing through the calculated entry. 

Thus, for f(x) = sinh x, the following table may be formed, in illustration, 
with the abscissas 0.0, 0.2, and 0.3: 
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ag= 0.00 0.00000 

1.0067 
Ἂς 
“ 


1.0318 


a, =09.20 | 0.20134 08367 


ςΝς 
ςς 


a, = 0.330 | 0.30452 


Suppose that only the given data are available and that the value of F (0.23) 
is to be interpolated. If we take x, = a,, the calculation from (2.4.2) is of the 
form 


£0.23) = 0.00000 + (0.23)(1.0067) + (0.23)(0.03)(0.08367) 
= 0.00000 + 0.231541 + 0.000577 = 0.23212 


with an associated error of —0.00009. (One extra place was carried through 
the intermediate calculation, with the final result rounded to five places.) 

By renumbering the x’s, the calculation can be rearranged in various ways. 
For example, since the argument of the interpolant is nearest a,, it may be 
suggested that we take x9 = a, and, say, x; = a, and X2 = Qo. In this case, 
there follows 


(0.23) = 0.20134 + (0.03)(1.0318) + (0.03)(—0.07)(0.08367) 
= 0.20134 + 0.030954 — 0.000176 = 0.23212 


with the same end result. The first calculation uses differences on the indicated 
forward diagonal starting from f(ag); the second uses differences on the indicated 
zigzag path starting from f(a,). By further renumbering, other paths also 
terminating with 7} 60. a,, 42] could be selected, all of which would give exactly 
the same value of the interpolant if no intermediate roundoff errors were present. 

The second path is the one which departs least from an imaginary 
horizontal line through the argument of the interpolant. Accordingly, the new 
information introduced at each stage of the calculation is that which may be 
expected to be most relevant to that interpolant, so that the rate of approach 
to the final value may be expected to be maximized at each step of the path. 
In addition, since the coefficients by which the successive divided differences 
are multiplied are smaller in magnitude along the preferred path, the effects of 
roundoffs introduced in the calculation of those divided differences will be 
somewhat reduced.t 


7 In this connection, it should be mentioned that if divided differences of rounded 
values (not rounded divided differences of true values) are used, if the results of the 
divisions do not require additional roundoffs, and if all following calculations are 
effected without roundoff, all paths which incorporate the same given data will lead 
to exactly the same end results. Thus the preferred path does not minimize the 
effects of inherent errors in the given data (as is sometimes argued). Those effects 
depend only upon the end point of the path and are considered in Sec. 3.2. 
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If the value of (0.10) were required, from the given data alone, the first 
path would be the preferred one from the preceding point of view and would 
lead to the calculation 


f(0.10) = 0.00000 + (0.10)(1.0067) + (0.10)(—0.10)(0.08367) 
= 0.00000 + 0.10067 — 0.000837 = 0.09983 


whereas the true five-place value is 0.10017. Finally, to interpolate for f(0.27), a 
path along the backward diagonal starting with f(a,) is preferable. Hence we 
would set X9 = a, x, = a,, and x, = do, and would obtain 


f(0.27) = 0.30452 + (—0.03)(1.0318) + (—0.03)(0.07)(0.08367) 
= 0.30452 — 0.030954 — 0.000176 = 0.27339 


as compared with the true five-place value 0.27329. 

In the preceding calculations, and in similar ones, when the number of 
differences to be retained has been decided in advance, and when the end point 
of the path is also predetermined, the reduction in loss of accuracy afforded by 
the preferred path usually is of no great consequence and the rate of approach 
to the final value at intermediate stages is irrelevant to the final result. Thus, 
the choice of paths then is relatively unimportant. However, in the more in- 
volved cases when differences of higher order are available, and when the point 
at which the path is to be terminated is not preassigned, it is desirable to choose 
that path which, when terminated after any number of steps, may be expected to 
afford the best result obtainable with that number of steps. The preceding 
examples were intended to illustrate such paths in simple cases. 


2.5 Newton’s Fundamental Formula 


The identities (2.2.10) and (2.4.2) are special cases of a general formula, due to 
Newton, which may be derived as follows. 
From the basic definition (2.3.1), there follows 


f(x) = fL%0] + & — xo) Χο; χ] 


7[χο. x] = S[ xo. x1] + ἃ — xpDS[%o, 1, x] 
ἀν ρον (2.5.1) 


Sf gnats Xpeas x] = f[xo.--+> Xn] + (x δ ΧΩ) 7[[Χο:--.. Χ,» χ] 
By substituting the second relation in the first, one obtains (2.2.10), 


70) =f[%o] + αὶ — Xo)S Lo, X1] + α -- Xx -- x)F Xo, X14, x] 
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and, by successively substituting from subsequent relations in (2.5.1), there 
follows finally 7 
70) = fL%0] + ὦ — xo) flo, χα] 

+ (X — χρ) — x1) f[X0, x1, x2] 

Ἔ 577 Ὁ (χ -- Xo) °° & — χ, 4}7[Χ0».-...ν, MH] + EO) (2.5.2) 
where 

E(X) = (ἡ -- χορ)" Ὁ -- HDF [Mos sod Hep χ] (2.5.3) 

The obvious details of the induction are omitted. 

The approximate relation obtained by suppressing the error term in 
(2.5.2) is known as Newton’s interpolation formula with divided differences. 
The resultant right-hand member, which is clearly a polynomial of degree n, 
may be denoted by po .. _n(x). An inspection of the error term then shows that 
Po,...,nX) is identical with f(x) if f(x) is a polynomial of degree n or less, and 
that it agrees with f(x) at the n + 1 points x = χρ,..., xX,» regardless of the 
form of Κα Further, there exists no other polynomial P(x) of degree n or less 
having this property, since, if this were the case, P — p would be a polynomial 
of maximum degree n with n + 1 zeros. This situation is impossible unless 
P — p vanishes identically. 

Thus, if f(x) is known at n + 1 distinct points Ay, Ay,..., 4,» Where 
ay < a, <*** < 4,» a variety of equivalent forms of the interpolation poly- 
nomial po.  κ(Χ) of degree n (or less) which agrees with J (x) at these points 
can be obtained by identifying each of the x’s in (2.5.2) with one of the a’s. 
The various possible forms are not considered here in explicit detail. However, 
in Chap. 4 a more detailed consideration is given to the situation in which the 
abscissas Q@p,...,a, are equally spaced, so that certain simplifications are 
possible, and convenient use can be made of available tables of certain coefficient 
functions. 

In illustration, we suppose that values of sinh x are given to five places 
for x = 0.0, 0.20, 0.30, and 0.50, and that sinh 0.23 is required by use of 
third-degree interpolation. The calculation may be arranged as follows: 


x, = 0.00 | 0.00000 


1.0067 
Xo = 0.20 | 0.20134 0.08367 

1.0318 0.17333 
x, = 0.30 | 0.30452 0.17033 

1.0829 


x,;=0.50 | 0.52110 
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£ (0.23) = 0.20134 + (0.03)(1.0318) + (0.03)(—0.07)(0.08367) 
+ (0.03)(—0.07)(0.23)(0.17333) 
= 0.20134 + 0.030954 — 0.000176 — 0.000084 = 0.23203 


The initial point x) was taken to be as near as possible to the argument of the 
interpolant, and the remaining abscissas were numbered in accordance with the 
indicated zigzag path of differences. The same end result, which is correct to 
the five places given, could also have been obtained by any one of a number of 
other orderings of the abscissas. 

Once an appropriate continuous path of differences (made up of diagonal 
segments, each sloping upward or downward to the right) has been selected, 
reference to (2.5.2) shows that the coefficient of the kth difference encountered 
is the product of k factors, each of which represents the difference between the 
abscissa of the interpolant and the abscissa of an ordinate used in the formation 
of a difference previously encountered. The instructions comprised in this state- 
ment are frequently referred to as Sheppard’s rules. 

It is convenient to speak of the data lying inside and on the boundary 
of the triangular region, limited by the column of ordinates (zeroth differences) 
in a difference table and the two diagonals passing through a specific difference 
in that table, as comprising the region of determination for that difference. It 
is then easily seen that the ordinates involved in the formation of any difference 
are exactly those ordinates which lie in its region of determination. Further, 
for a difference path of the sort considered here, the region of determination of 
the Ath difference encountered includes the regions relevant to all differences 
previously encountered. 

These facts permit us to write down, by inspection, the coefficient of any 
difference encountered in a chosen path. For example, in order to obtain the 
coefficient of 0.08367 in the preceding calculation, we notice that the region 
of determination for the preceding difference in the path (1.0318) includes the 
ordinates corresponding to the abscissas 0.20 and 0.30. Hence the desired 
coefficient is 


(0.23 — 0.20)(0.23 — 0.30) = —0.0021 


2.6 Error Formulas 


It was shown in the preceding section that, if f(x) is approximated by a poly- 
nomial y(x) = Po,... n(x) of maximum degree n, which coincides with it at the 
n + 1 distinct points xo,..., X,, then the error E(x) = f(x) — y(x) is given by 


E(x) = n(x)f[xo,.-- Xp» X] (2.6.1) 
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where z(x) is the monic polynomial of degree n + 1 defined as the product 
M(x) = (x — Xp(x -- χα)" -Ἃα — x,) (2.6.2) 


This form of the error term will be particularly useful in considering the accuracy 
of formulas for numerical differentiation and integration in subsequent chapters. 

However, if f(x) possesses a derivative of order n + 1 (and hence also 
continuous derivatives of order 1, 2,..., ”) in the relevant interval, there exists 
another form of the remainder which often is more useful in other considera- 
tions.t In order to obtain it, we notice first that both f(x) — y(x) and x(x) 
vanish at the n + 1 points x = x9, x,,..., x, We then consider a linear 
combination of these functions 


F(x) = f(x) — w(x) — K(x) (2.6.3) 


and determine the constant K in such a way that F(x) vanishes, not only at 
these » + 1 points, but also at an arbitrarily chosen point < which differs 
from all these points. Since x(x) vanishes only at the n + 1 points considered 
previously, K certainly can be so chosen. 

Let I designate the closed interval limited by the smallest and largest of 
the n + 2 values xo,..., X,, Χ. Then F(x) vanishes at least n + 2 times on 
the interval J. By Rolle’s theorem (Sec. 1.9), F ‘(x) vanishes at least n + 1 
times inside J, F(x) at least n times, ..., and hence, finally, F* (x) vanishes 
at least once inside J. Let one such point be denoted by €. There then follows, 
from (2.6.3), 


0 = fOr) — YOrME® — Ka" (264) 


But since y(x) is a polynomial of maximum degree ἡ, its (n + 1)th derivative 
vanishes identically. Also, from the definition (2.6.2), there follows n@*(x) = 
(n + 1)!. Hence (2.6.4) yields the determination 


(n+ ἢ! = ἢν © 


and the relation F(x) = 0 becomes 


ΔῊ ΒΕ. = (n+ 1) 
IB) -- αὐ = 7 ——— f*DE)n( x) 


for some é in J. If x is identified with any one of the abscissas x9,..., x,, both 


ΤΑ third form, involving analysis in the complex plane, is derived in Chap. 4, Prob. 39. 
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sides of this relation vanish, so that it is valid even in that previously excluded 
case. The bars now may be suppressed, and there follows finally 


= 1 (n+ 1) 

E(x) = n+ D! fer (ῷ πα) (Δ 6.5) 
for some € = ξ(χ) in the interval J, where J is the interval limited by the largest 
and smallest of the numbers Xo, X1,...5 Xn» Χ. 

This result guarantees merely that, for any given x, there exists at least 
one corresponding number ¢ in J such that the error is expressible in the given 
form. If f*(x) is continuous on the closed interval J (that is, the interior of 1] 
plus its end points), then f“*(x) is bounded on I. In particular, there then 
exists a positive number M,,,, such that 


IF" POLS Mari (2.6.6) 

in (2.6.5) and hence 
ΙΕΟῚ < ey πο (167) 
for all x in J. In order that this result hold, we will generally require in the 
sequel, not only that f“* 1) 0.) exist, but also that it possess the desired continuity. 


Since (2.6.1) and (2.6.5) must be equivalent, we obtain as a by-product 
the useful fact that 


n+1 
f xo: soy Xs x| = re (n + 1)! 4 7 ——— f‘ () (2.6.8) 
for some argument € in the interval J, whenever f“* (x) exists in 1. This fact 
will be needed in later developments. It can be seen that (2.6.8) continues to 
be valid when certain of the arguments x, ..., X,, x coalesce. 

In order to illustrate the application of the error formula (2.6.5), we 
consider the second-degree interpolation (n = 2) for f(0.23) effected in Sec. 2.4. 
Under the assumption that the analytic expression for the interpolated function 
is known to be f(x) = sinh x, there follows also f(x) = cosh x. Thus the 
error committed is given by 


£0.23) = =, (0.23 ~ 0.00)(0.23 — 0.20\(0.23 — 0.30) cosh é 


= —(0,0000805 cosh ἔ 


for some ὅ such that 0 < € < 0.30. It happens in this case that cosh x may 
be computed at the tabular points from the given data, by use of the formula 
cosh x = (1 + sinh? x)'/?, and the range of cosh x over the given interval 
is thus found (without the need for additional data, but with use of the fact that 
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cosh x increases throughout the interval) to be between 1 and 1.04534. Thus 


there follows 
—0.0000842 < E(0.23) < —0.0000805 


so that the error in the last place retained in the calculation should be —8. 
Actually, the error was found to be —9 in the fifth place. The discrepancy is 
due, not to roundoffs in calculation (which were sufficiently controlled by 
retention of a sixth digit, as may be verified), but to the fact that each of the 
original data possesses a roundoff error which may be as large as 5 x 107°, 

In other applications of interpolation, the analytic expression for F(x) 
may not be known, and hence it may be impossible to determine the range of 
possible values of f*")() in order to bound the error E. In such cases, the 
relation (2.6.1) may be more useful. For, if sufficient data are available to 
permit the evaluation of one or more sample values of the (n + I)th divided 
difference, these values may be taken as estimates of the value of the divided 
difference which is actually relevant to (2.6.1). Thus, from the data obtained 
in Sec. 2.5, the divided difference | 


f[0.00, 0.20, 0.30, 0.50] = 0.17333 


may serve as an estimate of the required value f[0.00, 0.20, 0.30, 0.23], lead- 
ing to the error estimate 


E(0.23) = 0.17n(0.23) + (—0.00048)(0.17) = —0,00008 


The fact that this estimate is indeed good in this case is a consequence 
of the fact that the third derivative, and hence also the third divided difference, 
does not vary greatly in the range considered. It may be noticed that this error 
estimate is precisely the correction term which was involved in the calculation 
of Sec. 2.5 as a result of incorporating the contribution of the third difference. 
More generally, a consideration of (2.5.2) and (2.5.3) shows that if an interpola- 
tion for f(x) is made, terminating with an nth difference, the error committed 
is given exactly by the product of the calculable number mx) and the (n + 1)th 
difference f[x,..., X,; X], which is not calculable unless J (x) is known. On 
the other hand, the first term omitted in a calculation based on (2.5.2) is the 
product of x(x) and the (n + 1)th difference f ixoetagtaaeaaie Af 
F[Xo0,-++ 5 Xp, X] does not vary markedly over an interval including x = x 
and x = x,,,, this first term omitted will indeed supply a good estimate of the 
error. This situation certainly will exist, in particular, in consequence of (2.6.8), 
if f"* (x) does not vary markedly over an interval I including x = Xo,..., 
Xn+19 x. 

It may be noticed that, as n increases without limit, the length of the 
interval I, as well as that of the interval limited by x and x,.,, generally will also 
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increase without limit, since the later abscissas introduced generally are more 
remote from xX, so that the uncertainty of this particular error estimate may be 
expected to increase. In fact, in many cases the result of omitting the error term 
in (2.5.2), and allowing n to become infinite, leads to an infinite interpolation 
series which is itself not convergent. That is, the error E(x) associated with 
retention of differences of order not greater than n very often does not tend to 
zero as n increases without limit.t However, if the abscissas xo, X;, X2,--- 
are appropriately ordered, it is usually true that the magnitude of the error Εὶ 
first decreases fairly rapidly with increasing n, but then increases in magnitude 
as n continues to increase. In most practical cases, the minimal error is ex- 
tremely small, and the minimal stage occurs for a value of n so large that it is 
not actually encountered. 

In view of this situation, the error E(x) is not generally one which can be 
reduced in magnitude within an arbitrarily prescribed tolerance by increasing 
the number of differences retained. Thus, although this error is commonly 
known as the truncation error, it should be noticed again that this terminology 
often is somewhat misleading in that it would seem to imply an error committed 
by truncating a convergent infinite sequence of calculations after a finite number 
of steps. 

As in Sec. 1.3, we continue to define a truncation error as any error which 
would be present even in the ideal case when the given data are exact and in- 
finitely many decimal places are retained in the calculations, and we shall refer 
to E(x) as a truncation error in this general sense. The superimposed effects 
of roundoff errors may be of equal or greater importance. In fact, the most 
efficient procedure is frequently that one in which the maximum (or RMS) 
errors due to truncation and to roundoff are of the same magnitude. 


2.7 Iterated Interpolation 


In Sec. 2.2, it was shown that Jinear interpolation can be conveniently effected 
by use of the formula 
1 


Χι — Xo 


Pox) Xo — x 


2.7.1 
pi(x) x, τ a 


Po,(x) = 


where po(x) and p,(x) are two independent interpolation polynomials of degree 0, 


PAlx) =f(%o) Pi) =f) (12 


+ The series obviously terminates if f(x) is a polynomial, and is a convergent infinite 
series in certain other cases. Some information with regard to this question is given 
in Sec. 4.11. 
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In the same way, quadratic interpolation can be effected by linear inter- 
polation over two independent linear interpolation polynomials, so that, for 
example, 


1 xX) X—xX 
Po,1,2(X) “5 Ρο,ί ) 0 
Χ) — Xo [Ρι,.21(Χ) Χγπ x 
1 Ρο,(α) Χι τ xX (2.7.3) 
X2 — Xq | Po,2(x) X2—-—Xx 


In order to verify this fact directly, we may notice, for example, that the first 
right-hand member of (2.7.3) is a polynomial of second degree, that it obviously 
takes on the values f(x) and f(x) when x = xo and x = X2, respectively, 
and that when x = x, it correctly takes on the value 


1 


SUE 


Xz — Xo 


= f(x;) 


I (x1) Xo —~ X4 
xy 


7 Χ) - 


In a similar way, we may effect cubic interpolation by linear interpolation 
over two independent quadratic interpolation polynomials, and so forth (see 
Prob. 38). This procedure is particularly useful in machine calculation for the 
purpose of generating a sequence of interpolates from which the rate of effective 
convergence can be estimated in cases when use cannot be made of analytic 
error bounds. 

In Aitken’s method, the first four stages of the calculation would be tab- 
ulated as follows for desk calculation: 


Xo Po Xo —- X 
x1 P1 Po, xX; —<X 
X2 P2 Po,2 Po,1,2 X2—<xX 
X3 P3z Po,3 Po,1,3 Po,1,2,3 x3 — xX 
X4 Pa Po,a Po,1,4 Po,1,2,4 Po,1,2,3,4 X4—-X 


Here, for example, the entry Po,1,3 Would be obtained by evaluating the 
determinant 

Po,1 χι τ 
Po,3 x3 τς 


al ad 


the elements of which are seen to be conveniently located in the above array, 
and dividing the result by x, — x,. Here an additional convenience is afforded 
by the fact that this divisor can be obtained as the difference (x; — x) — 
(x; — x) between the entries in the right-hand column. 

The abscissas labeled as xo,..., X, May be arranged in any algebraic 
order; the final value Po,..., 18 independent of that arrangement (barring 


{1 The phrase effective convergence will be used in accordance with the generally 
asymptotic nature of the sequence. 
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the effects of intermediate roundoffs). However, it is often desirable to designate 
the abscissa nearest the argument of the interpolant xX by xo, the second nearest 
by x,, and so forth. For then the entries po, Po,1, Po,1,2, and so forth, may be 
expected to represent the best possible estimates, based on the given data, 
which can be afforded by polynomial interpolation of degree zero, one, two, 
and so forth. Also, each such estimate makes use of all the information used 
in the preceding estimate, together with one additional datum. Thus the rate 
of effective convergence can be fairly confidently estimated by considering the 
sequence of entries in the diagonal of the table. 

For the interpolation problem considered in Sec. 2.5, the work could be 
arranged as follows, through the third-degree calculation: 


20 0.20134 -3 
30 0.30452 0.232294 7 

0 0.00000 1541 0.232118 —23 
50 0.52110 3316 1936 0.232034 27 


In the absence of further information, the correctness of the fourth place prob- 
ably would be presumed, whereas the fifth place would be considerably in 
doubt. In order to decrease the uncertainty, further information would be 
needed. If, for example, f(0.60) were also available, an additional row of entries 
then would be calculated, as follows: 


60 | 0.63665 0.233988 0.231899 0.232034 0.232034 | 37 


Thus the value 0.23203 appears to be stabilized as the five-digit interpolate 
corresponding to the given data. 


2.8 Inverse Interpolation 


It frequently happens that a variable y is given in tabular form (or analytically) 
as a single-valued function of x, say y = f(x), and thata value of the independent 
variable x is required for which the dependent variable y takes on a prescribed 
value (frequently zero). This is the problem of inverse interpolation. 

If 7 = f(x), then on any x interval including x, in which dy/dx = f’(x) 
exists and does not vanish, a unique inverse function, say x = F(y), exists, 
such that x = F(jy). Thus, if dy/dx does not vanish near the point where the 
inverse interpolation is to be effected (so that y increases or decreases steadily 
in the neighborhood of that point), it may be that F(y) can be satisfactorily 
approximated in that neighborhood by a polynomial of moderately low degree, 
so that the inverse interpolation may be effected by merely tabulating x as a 
function of y in that neighborhood, and using the preceding methods (or any 
other appropriate methods) of direct interpolation. 
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In illustration, suppose that the following data are available and that the 
zero of y(x) between x = 1.3 and x = 1.4 is required. 


x 1.1 1.2 1.3 1.4 1.5 
Se σὸς εὐτὸ 
y 0.769 0.472 0.103 — 0.344 — 0.875 


If Aitken’s method is used, with the entries ordered with respect to the nearness 
of an ordinate to zero, the calculations may be arranged as follows: 


a4 x Py 
103 1.3 103 

— 344 1.4 1.32304 — 344 
472 1.2 2791 1.32509 472 
769 1.1 3093 548 1.32447 769 

— 875 1.5 2106 432 82 1.32463 — 875 


Thus, a fourth-degree interpolation yields x ~ 1.3246, with its last place 
in doubt, although the uncertainty corresponding to the presence of roundoff 
in the given data would also remain. Actually, the given data are exact values 
corresponding to the algebraic relation y = ~—x? + x + 1, and the problem 
can be considered as that of determining the real zero of the algebraic equation 
χ᾽ — x — 1 = 0, the true value of which is 1.32472, to five places. 

Evidently, if this problem were stated in its analytic form, recourse to a 
semianalytic method such as that of successive substitutions or the Newton- 
Raphson iteration (see Sec. 10.11) would also be appropriate. Even when the 
correspondence is given only in tabular form, it would also be possible to 
approximate the relation y = f(x) by the relation Y = Po,... n(x), where the 
equation of the approximation is expressed in explicit polynomial form, with 
the help of Newton’s interpolation formula or of one of the other formulas 
to be obtained, and to solve the resultant approximating algebraic equation by 
such iterative methods. However, in order to estimate the accuracy obtained, 
it would be desirable to repeat the calculation for several values of n, each of 
which would lead to a distinct algebraic equation. 

If dy/dx vanishes near the point (x, y) where the inverse interpolation 
is to be effected, then the derivative of the inverse function becomes infinite 
near that point, and a satisfactory approximation to the inverse function gener- 
ally cannot be obtained by using a polynomial of low degree. In such a case, 
a simple iterative procedure is useful. For this purpose, suppose first that two 
abscissas x, and x, are available with the property that y lies between y, = 
F(x,) and y, = f(x) (see Fig. 2.1). If yq and y, are sufficiently nearly equal, 
and if dy/dx # 0 in the interval between x, and x,, linear inverse interpolation 
may then be used to obtain a first approximation to <x, say x). Then, by 
direct interpolation, using the ordinates Yao }»» and an appropriate number of 
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other known ordinates, the true value f(x“) may be approximated. Then, if 
that result is designated as jy“), linear inverse interpolation based on y‘» and 
either y, or y, (whichever one is separated from y? by ) is used to determine 
a second approximation to x, say, x, and the cycle of operations is repeated 
as often as necessary. 

Methods of this sort, in which the only inverse interpolation involved is 
linear, and in which high-degree interpolation is effected only on the direct 
function f(x), are particularly to be recommended in those cases when it is 
known that f(x) can be satisfactorily approximated by a polynomial of reason- 
ably low degree over an interval including x, but when it is difficult to be certain 
that the inverse function F(y), such that x = F(jy), also can be fairly approx- 
imated by a polynomial in y, of comparable degree, over the corresponding 
interval in y. Whereas situations of this sort obviously are to be anticipated 
when f’(x) vanishes for a real value of x near xX, they may also occur in the 
absence of such a warning. 

In critical cases, in particular in the case when dy/dx = 0 at the desired 
point, it is usually desirable to use one of the semianalytic methods mentioned 
earlier, in which f(x) is approximated by a polynomial p(x) and the algebraic 
equation p(x) = 7 is solved by an appropriate iterative method. 


2.9 Supplementary References 


Standard references dealing with divided differences and with the basic New- 
tonian interpolation formula include Steffensen [1950], Milne-Thomson [1951], 
and Hartree [1958]. 
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In Krogh [1970], algorithms are presented for the purpose of making the 
use of Newton’s divided-difference formula for interpolation and numerical 
differentiation on a computer compare favorably in efficiency with the use of 
other formulas. 

A method of iterated interpolation similar to that of Aitken [19325] 
is due to Neville [1934]; see also Kopal [1961]. References to additional 
methods of inverse interpolation are included in Sec. 3.12 and Sec. 4.13. 

For the use of divided differences in two-dimensional analysis see, for 
example, Salzer [1959] and Stancu [1964]. 


PROBLEMS 
Section 2.2 


1 Use (2.2.5) to calculate approximate values of f(x) when x = 1.1416, 1.1600, and 
1.2000 from the following rounded data: 


x 1.1275 1.1503 1.1735 1.1972 


f@) 0.11971 0.13957 0.15931 0.17902 


2 Calculate the three first divided differences relevant to successive pairs of data 
in Prob. 1, and use (2.2.4) to determine approximate values of f(x) for 


x = 1.1600(0.0020)1.17007 


3 Prove that f[xo, x,] is independent of xo and x, if and only if f(x) is a linear 
function of x. 


4 If f(x) = u(x)v(x), show that 
f [xo, x1] = ul[xo]v[xo, χα] + u[xo, x; Jo[x;] 
5 If (Δ) is continuous for χορ S x S x,, show that 


Ff [xo, x1] = (ὦ 


for some € between xp and x,, and hence also that 


f [Xo Xo] = am F[xo, x1] = f'(xo) 


Section 2.3 


6 If the abscissas in Prob. 1 are numbered in increasing algebraic order, verify 
numerically that f[xo, x1, x2] = f[x2, xo, χα]. 


} The. notation x = m(h)n denotes that x is to take on values between x = m and 
x = ἡ, inclusive, at increments of A units. 
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7 Suppose that x, = Xo + rh(r = 1, 2,...), so that the abscissas are at a uniform 
spacing h. Show that (2.3.3) then becomes 


yg = (-1F' 1 teal k 
"i(k — ἡ) he Wk! Ni 
( 
where 


ἢ is the binomial coefficient. Thus deduce that 


i 
kK 
1 » (Κ 
Xo,---> Xe] = — =) x; 
7ῖχο.--» χα] Fai ) (ἡ fe 
in this case. 


8 Assuming that x, = χορ + rh, verify directly (from the definition) the truth of the 
following special cases of the relation established in Prob. 7: 


ΓΤ . [f(xy — f%o)] 


Ff [x05 ¥1, X2] = 


5] πα [f(x2) — 2.χ) τ f(%o)] 


f [xo, X15 X25 x3) = 


31 = [f(x3) — 3f(%2) + 340%) - f(xo)] 


9 If f(x) = df(x)/dx, show that 
| Flo, x] #S’ [χο, α] 
ax 


unless f(x) is linear. 


10 τῷ f(x) = u(x)v(x), show that the relation established in Prob. 4 generalizes to 
the form 


n 
f[xo..--> Xn] = 2 Ut Xosisca ec Xp Apes ece Kal 
Use induction, assuming the truth of the relation for n = N, showing that then 
Sf [x1,---> Xv41] — f[xo,.--5 xn] 


N 
= Σ; (ἀνε ἿΞ X,)U[Xo,-- οὐ Xv LX. ++ Xve1] 
k= 
+ (Xp41 - Xo)U[Xo. +--+ Xn41 WLXK+1>- ΠΡ 


and that this expression properly reduces to 
(νει — Xo) («troll ΠῚ 


Ν 
+ > ulxo,-- +> Xx le [xn -- +s Χναα] 


+ ulxo,.--> ves Leys) 
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1 1 f(x) = (ax + δ)(εχ + d), obtain expressions for f[x, y], f[x, x, y], and 
f(x, x, y, y] in compact forms when x # y. 

12 If f(x) = x°, obtain expressions for f[a, ὁ, c], f[a, a, δ], and f[a, a, a] when 
axzbFe. 


Section 2.4 


13 Repeat the calculations of Prob. 2, making use of the second divided difference 
Ff [ao, 41, ay]. 

14 Compare the results of Prob. 13 with those obtained by using the second divided 
difference f [a;, 42, a3] instead. 

15 Obtain the formula 


[λα = Ges -- χολίαὼ + 4041 = Χο γίλο, κι] 


— $(%1 -- X0)*f[Xo, χε, χ2] + E 
where 


Ε- [« — χρ)ὰα -- χι)ᾷὰ -- X2)f (x0, %1,X2, x] dx 


16 Apply the formula of Prob. 15, neglecting the error term, to the data of Prob. 1, 
obtaining approximate values of the integral of f(x) over each subinterval and 
hence obtaining also approximate values of the integral from the smallest abscissa 
to each of the others. Then use interpolation to approximate the integral over 
[1.14, 1.18]. 


Section 2.5 


17 If f(x) = i/(a — x), show that 


1 
ἐς at een EET rT πεν 
and 
1 
αι as een rn Rear me  Ξ 


and deduce that 


1 1 X — Xo 


α-χ a-Xp (a — χρ)ί -- x1) ++: 
(6 Ξ χω @ = χρῶ) κρὺ 
(a -- χρ)""- (α -- xq) 
where 


E(x) = . πὸ 
m(a)(a -- x) 
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with the abbreviation 


mx) = (x — χρ)ὰα — χ)ὔ ἃ — X,) 


Assume throughout that a # Χο» X1,..-5 Χμ» Χ. 

The following table lists the rounded value of the probability Q that the magnitude 
of a normally distributed error, with mean value zero and standard deviation 
unity, exceed 8, for certain values of ¢ and Q. Calculate from it approximate 
values of QO for ¢ = 0.7, 0.9, 1.1, and 1.2. 


E 0.4 0.5 0.6 0.8 1.0 1.25 


Q 0.68916 0.61708 0.54851 0.42371 0.31731 0.21130 


Use the data of Prob. 18 to calculate approximate values of 8 for Q = 0.4, 0.5, 
and 0.6. 

Suppose that values of f(x), f’(x), and f’(x) are known for x = Xo, values of f(x) 
and f’(x) for x = x,, and the value of f(x) for x = x2. Show that the correspond- 
ing divided-difference table appears as follows, through third differences, where 
each difference is formed from diagonally adjacent entries to its left by the usual 
rule, the values of the derivatives being entered in advance: 


Xo f (xo) 
Γαὸο 

Xo f(xo) $f"(Xo) 

(Xo) ἴχο. Xo; Xo; x1] 
Xo fo) f[xo, Xo, X1] 

[Xo, X1 f [x05 Χο» X1, χα] 
Xi αὐ fixo, χι, X1] 

(x1 f[xo, X1, X1, X2] 
x1 f(x1) f[x1, X1, X2] 

[x1,; X2 
x2 f (x2) 


Notice also that “Sheppard’s rules” remain applicable to any difference path 
made up of contiguous diagonal segments, and write down the formula which 
introduces successively the values of f(x), f’(Xo), f’(Xo), £1), F’°1); and f(x,). 
The following rounded values of Q(e) and its derivative Ο΄ (6) are known. By 
appropriately modifying the procedure illustrated in Prob. 20, construct a suitable 
difference table and calculate approximate values of Q for ¢ = 0.2(0.2)0.8. 


8 Q Q’ 

0.0 1.0000 — 0.7979 
0.5 0.6171 — 0.7041 
1.0 0.3173 — 0.4839 


Assuming that the third divided difference of f(x) is constant for all x, fill in the 
spaces in the following divided-difference table (from right to left), and hence 
evaluate f’(8) and 7, (8): 
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0 3 
1 
1 4 4 
13 1 
3 30 10 
63 1 
6 219 17 
148 — 
8 515 -- 
8 515 — 
8 515 


Also use a similar procedure to obtain 7, (3). Determine an analytic expression for 
f(x) and check the results. 

If f(x,), f(x2), and f(x3) are values of f(x) near a maximum or minimum point at 
x = X, obtain the approximation 


gw it Xo _ f[x1, x2] 
2 2f (x1, X2> ΧΆ] 


and show that it can also be written in the more symmetrical form 

w Διὶ t+ 2x. + X3 ἔχι, x2) + 7Χ2, x3] 
4 Af [x1, x2, x3] 

Show also that, when the abscissas are equally spaced, it becomes 


san 4 fs —f 
NA - 2h +h 


x 


where / is the common interval. 


Section 2.6 
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Show that the truncation error associated with linear interpolation of f(x), using 
ordinates at x9 and x, with x9 S x S xj, is not larger in magnitude than 


ἐΜ,6(, -- Xo)? 


where M, is the maximum value of |f’(x)| on the interval [xo, x, ]. Does this 
result hold also for extrapolation? 

Under the assumption that the data in Prob. 1 correspond to the function f(x) = 
sin (log x), show that the truncation error corresponding to linear interpolation 
between successive ordinates is smaller than one unit in the fourth decimal place. 
Show that the magnitude of the truncation error, corresponding to linear inter- 


polation of the error function 
ae ae 
erf x = = | οὕ dt 
Vn Jo 
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27 


28 


29 


30 


31 


between x, and x,, cannot exceed 
(x; — Xo)? 
2 2ne 


and hence is smaller than (x; — Xo)*/8. 
In the special case when the abscissas are equally spaced, with separation ἢ, show 
that the magnitude of the truncation error corresponding to second-degree inter- 


polation based on ordinates, at xo, x,, and x, does not exceed (M,h3)/(9V 3), 
where M, is the maximum value of |f”(x)| on the interval [xo, x2]. Show also 
that, on the average, the largest errors may be expected to occur at distances of 
about h/ 4/3 = 0.58h from the central abscissa. (Translate the origin to the point 
ἃ ΞΞ Xi) 

Show that the magnitude of the truncation error associated with third-degree 
interpolation based on ordinates at equally spaced points x9, x1, x2, and x3 does 
not exceed (3M,h*)/128 for interpolation between x, and x2 and is, on the average, 
largest at the center of that interval. Show also that it does not exceed (M,,h*)/24 
for interpolation between x, and x, or between x, and x3, with a maximum to be 
expected, on the average, at a distance of about (3 — J 5)h/2 π 0.38h from xo 
or x3, where M, is the maximum value of |f (x)| on [xo, x4] in all cases. [Trans- 
late the origin to the midpoint (x, + x,)/2.] 

Obtain the formula 


70) = (xo) + α — χ Οὐ + ἃ — X0)’f Po, Xo ¥1] 
+ (x — χο)ῦα — x1) f [Xo Xo, X15 X1] + EO) 
where 
E(x) = πὰ — x0)? — Οὐ οὐ «χιᾷξ < 1) 
and show that 


h* 
E(x)| Ss —- max ([f’(x) 
|E(x)| 7 ed (x)| 
If f(x) = 1/(x + 1) and y(x) is the polynomial approximation of degree n which 
agrees with f(x) when x = 0, 1, 2,..., ἡ, show that the use of (2.6.5) leads to the 


error bound 
|IE(x)| < [x(x -- I---@ -- n) 


whereas (2.6.1) permits the less conservative bound 
1 
E(x)| < —— |x(x -- 1)°::-‘@—-— π 
ΕΟ) are a es | )| 


when x 2 0. 
Suppose that a table presents values of f(x) rounded to r decimal places at a uni- 
form interval ἡ in x, and that linear interpolation is employed for the calculation of 
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f(x). Suppose also that the tabular abscissas are exact, that the abscissa x is 
rounded to s decimal places, and that the calculated approximate value of F(x) 
is rounded to ¢ decimal places. If f’(x) is continuous over the tabular range, and 
if ὃ is the total error in the resultant interpolate, show that 


Ιδὶ Ss +M,h? 3 5M, xX 1075-1 +5 x 107"-1 ae 5 x 10-'-! 


where M, and M, are the maximum values of | f’(x)| and |f’(x)|, respectively, in 
the tabular range. 

The function f(x) = log;o sin x is tabulated for x = 0.01(0.01)2.00 to five decimal 
places. If linear interpolation is employed, with the abscissa of the interpolant 
rounded to five decimal places, and if the calculated result is also rounded to five 
places, determine the portions of the table for which the results certainly will be 
correct within five units, and within 50 units, in the fifth place. What accuracy 
could be guaranteed over those ranges if the abscissa of the interpolant were 
rounded to four places? To three places? 

A table of values of the function f(x) = (x* — x)/12 is to be constructed for 
0 S x S ain sucha way that the error in linear interpolation would not exceed Ἢ 
if the effects of roundoff were negligible. Show first that, if the spacing ἢ is to be 


uniform, then A should be smaller than 2/ 2e/a and at least a?/(2 ν 24) entries will 
be required. Show also that, if the interval [0, a] were divided into the subintervals 
[0, «] and [«, a], and if uniform spacings A, and h, were used in those respective 
subintervals, then the most efficient division would be such that the conditions 
a= a/2,h, = 2V/ 2e/a, and h, = 2h, were approximately satisfied, correspond- 
ing to a reduction of about 25 percent in the number of entries. 

From the following table of rounded values of f(x) = (x/10)!/2, construct a 
divided-difference table and determine successive approximations to {(0.5) = 
(0.05)!/? corresponding to the use of one, two, three, four, and five successive 
ordinates, including that at x = 0. Compare these results with the true value. 
How could the existent situation have been predicted (without direct calculation) 
assuming knowledge of the analytical form of f(x)? What preliminary warning is 
afforded by reference to the difference table alone? 


x 0 1 2 3 4 


f(x) 0.00000 0.31623 0.44721 0.54772 0.63246 


Form a divided-difference table based only on the ordinates of the function 
f(x) = x° — 5x9 + x? + 4x -- 2at the points x = —2, —1, 0,1, and 2. Then 
interpolate from this table approximate values of f(x) at x = --1.5, —0.5, 0.5, 
and 1.5, and compare them with the true values. How could the possibility of the 
existent situation be predicted (without direct calculation) assuming knowledge 
of the analytical form of f(x)? 
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Section 2.7 


36 
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Use the Aitken procedure to determine Q(0.7) and ε(0.5) as accurately as pos- 
sible from the data of Prob. 18. 
Use the Aitken procedure to determine (0.20000) as accurately as possible from 
the following rounded values of f(x) = sin [sinh~* (x + 1)]: 
x 0.17520 0.25386 0.33565 0.42078 0.50946 

f(x) 0.84147 0.86742 0.89121 0.91276 0.93204 
Deduce the validity of Aitken’s method by establishing the relations 
Po,1,...,m,n(X) = Po,t1,.. _m—1,m(X) 


Ἔ Ια =s Xo) mere (x mer Xm—1)(% = Xm) lf (xo, -++9Xm Xn] 
Po,1,...,m,n(X) = Po,1,.. _m—1,n() 
+ [Ια τ Χο)" es (x σὺ Χρι-- | ~ xn lf [xo, ἐν) Xm Xn] 


and eliminating f [xo,...» Xm: Xn] between them. 


Section 2.8 


39 


40 


4] 


If y = f(x) and if f(x) # 0 for x9 < x < x,, show that the truncation error of 
linear inverse interpolation based on corresponding values (Xo, Yo) and (x4, y1) 
is given by 
fF") 

2((Ὠ }" 
where Xp < € «χ,, ΟἹ} ΟἹ] exists and is continuous in that interval. 
Show also that the magnitude of this error is limited by each of the bounds 

Οἱ - οὐχ Wape, ᾿ (3 


8 8 8 \m,) m, 


if h = x, — Xo MOOI @E| < Km S |f'G)| S My, and |f"Qd| S Mz 
for xo Ξ x S Xj. 

Suppose that f(x) = x? is tabulated for 0 S x S 1 with a uniform spacing of 
hin x. Assuming that sufficiently many significant figures are supplied and retained 
in the calculation to permit the neglect of the effects of roundoff errors, determine 
a (as a function of ἡ and 8) so that the error of linear inverse interpolation will not 
exceed a specified quantity « on the interval [«, 1]. What spacing would be 
required to assure an accuracy within 0.005 for 0.15 xs 1.0? 


Repeat the calculations of Prob. 40 when 


—(y — YoY -- V1) 


f(x%) = [ sin 12 dt 


0 


[Use the inequality sin u > 2u/xn (0 < u < π|2) in bounding the error. | 
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Tabulate y = x? for x = 0, 1, 2, 3, and 4, and then interchange the roles of 
x and y; now considering x as a function of y (with x = 0), use fourth-degree 
interpolation to obtain an approximation to x when y = 12. Compare the result 
with the true value and account for the existent situation analytically. Also 
plot the relevant fourth-degree interpolation polynomial x = p(y), with appropriate 
attention to its maxima and minima, and superimpose a plot of the true relation. 
Given the following data, use the iterative process of inverse and direct inter- 
polation to determine, to four decimal places, the value of x between 1.50 and 1.60 
for which f(x) = 0.99800: 


x 1.40 1.50 1.60 1.70 1.80 


f(x) 0.98545 0.99749 0.99957 0.99166 0.97385 


Calculate an approximation to the value of x-required in Prob. 43 by approximat- 
ing f(x) by the second-degree polynomial p(x) which agrees with f(x) at the points 
for which x = 1.50, 1.60, and 1.70, and solving the quadratic equation P(x) = 
0.99800. Then use the iterative method of Prob. 43 to obtain an improved 
approximation which may be expected to be correct to four decimal places. 

The following critical table for the function f(x) = x(x -- 1)(Q2x — 1)/12 has the 
property that, for any x between successive tabular abscissas, the corresponding 
value of f(x) rounds to the entry given for that interval: 


χ f(x) 
0.05667 

0.0040 
0.05844 

0.0041 
0.06025 

0.0042 
0.06208 

0.0043 
0.06394 


Construct the table, by first tabulating f(x) for appropriate convenient values of x 
and then using inverse interpolation to obtain x when 


f(x) = 0.00395(0.00010)0.00435 
or otherwise. 
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LAGRANGIAN METHODS 


3.1 Introduction 


For many purposes, it is desirable that a formula for interpolation, numerical 
differentiation, or numerical integration be expressed explicitly in terms of the 
ordinates involved rather than in terms of their differences or divided differences. 
Such formulas permit a more direct consideration of the effect on the end result 
of a change or error in one or more of the ordinates, and their use does not 
require the calculation, tabulation, or storage of differences. However, it is 
found that these advantages are attained only at the sacrifice of others. 

The basic formula, apparently due to Waring, but associated with the 
name of Lagrange, is derived in Sec. 3.2, and its general use in interpolation, 
differentiation, and integration is illustrated in Secs. 3.3 and 3.4. A number of 
specific formulas and techniques for numerical integration and differentiation 
are derived from this formula and, in the cases when the abscissas are equally 
spaced, these are studied in the remaining sections of the chapter. 
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3.2 Lagrange’s Interpolation Formula 


Lagrange’s form of the polynomial y(x) = Po,...,n(x) Of degree n, which takes 
on the same values as a given function f(x) for the m + 1 distinct abscissas 
Xo, Xy,-.-, X,, differs from the newtonian form derived in Sec. 2.5 in that the 
ordinates involved are displayed explicitly in the lagrangian form, while the 
newtonian form explicitly involves divided differences of those ordinates. 
Whereas it clearly must be possible to derive Lagrange’s form from (2.5.2), 
its importance justifies the indication of three alternative methods of approach, 
which are typical of methods also useful in other considerations. 
As a first approach, we could write y(x) in the form 


W(x) = Ag + AX + °° $A = Σ A,x* (3.2.1) 
k=0 


where the A’s are to be determined in such a way that (x,) = f(x) for i = 
0, 1,...,. These requirements are represented by the ἡ + 1 linear equations 


Ag -+- A,Xo + A, xe +e + A,X6 = f(Xo) 
PEC ae εν, (3.2.2) 


Ay + AyX, + 4,Χ2 + +++ + A,x" = f(x,) 


If these equations are solved by use of determinants, the use of special properties 
of the determinants involved leads to rather simple expressions for the A’s in 
terms of the ordinates, and the introduction of these results into (3.2.1) leads 
to the desired result (see Prob. 5). The requirement that the A’s satisfy (3.2.1) 
and (3.2.2) can be expressed by the condition 


y be ee x" 

χὰ. 1 eRe “ὦ oe 
I (Xo) 0 Xo ο, 0 (3.2.3) 
FO)? ὅν Bee BG 8 ae 


the expanded form of which would also give the equation of the required inter- 
polation polynomial y = po ο ,(x). 
Alternatively, we could write y(x) directly in the required form 


W(X) = Io) F(%o) + LOS) + °° + LOSE) = > LOS) 65.2.4 


where /,(x),..., 1,(x) are polynomials of degree 7 or less, to be determined by 
the requirement that the result of replacing y(x) by f(x) be an identity when 
f(x) is an arbitrary polynomial of degree n or less. It is clear that this situation 
will prevail if and only if the result of replacing y(x) by f(x) is an identity when 


82 INTRODUCTION TO NUMERICAL ANALYSIS 


f(x) = 1, x, x?,..., and x”. These requirements are represented by then + 1 
equations 
I(x) + A(x) ἘΠ: + 4,0) = 1 
Xolo(x) + χα (Χ) + °° + Xl.) = x 


Xolo(x) + χα (x) + τ᾽" + pba) = χ' 


from which the coefficient functions can be determined directly as ratios of 
determinants which can be expanded in simple forms. The eliminant of the 
Eqs. (3.2.4) and (3.2.5) is merely the result of interchanging rows and columns 
in the matrix whose determinant appears in (3.2.3), so that the equivalence of the 
final forms is indeed confirmed. 

Rather than pursue either of these lines, we may avoid somewhat lengthy 
calculation by noticing that the expression (3.2.4) will indeed take on the value 
f(x, when x = x,if1(x,) = land if/(x;) = Owhenj # 7. With the convenient 
notation of the so-called Kronecker delta 

by = ᾿ pie 820 
1 ifi = j 
this requirement becomes merely 


I(x) = 6, (@=0,...,2;7=0,...,m) (3.2.7) 


Since /,(x) thus is to be a polynomial of degree ἢ which vanishes when 
X = Xo, Xy5--+5 Xia» Xin19+++> Xp» there must follow 


I(x) = CL -- Xo) ὦ - ee HS Xara = Xn) (3.2.8) 


where C, is a constant. The final requirement /(x;) = 1 then determines C; 
in the form 


1 


= ee (3.2.9) 
(x; — Xo) °° (Xj - χες εὐχαὶ το χα)" OE - Xn) 


and the desired lagrangian coefficient functions I(x) are obtained by introducing 
(3.2.9) into (3.2.8). 

In order to put this result in a somewhat more compact form, we first 
review the notation of (2.6.2): 


n(x) = (x — χρὶα — Xy)°°° & — X) (86.2.10) 


Now the derivative of (x) is clearly expressible as the sum of m + 1 terms, 
in each of which one of the factors of m(x) is deleted. Thus, if we set x = x; 
in this expression, we obtain the useful result 
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Z 


W(X) = (χ; — Χο) (α; — Xq) = C (3.2.11) 


where the factor (x; — x;) is to be omitted in the product. Thus, after introduc- 
ing (3.2.8) and (3.2.9) into (3.2.4), we obtain the /agrangian interpolation poly- 
nomial of degree n in the form 


_S 27) a & 
y(x) = Σ Gm apni SO = 2, WOOD (3.2.12) 
where 
7(x) 
(x — x;,)n’(x;) 
= (X — χορ)" ( -- χα) -- Xa & -- Xn) (3.2.13) 
(X; — Χο)" Oi — Xi 1)%i το χα}. Ὁ, -- χρ) 


Ix) = 


The first expression for /,(x) is useful in theoretical considerations, the second 
in the actual calculation of the function. 

It should be noticed that the definitions of the functions x(x) and /,(x) 
involve the degree n of the interpolation polynomial. Generally, in the sequel, 
the value of ἡ will be clear from the context. When a more explicit notation is 
necessary, we may replace /;(x) by /; ,(x) and 2(x) by π, ΟἹ. 

The direct derivation of (3.2.12) from the newtonian form (2.5.2) is of some 
academic interest and may be effected by making use of (2.3.2) and comparing 
(2.3.3) with (3.2.11). 

In view of the equivalence of (3.2.12) and (2.5.2), the error committed 
by replacing f(x) by y(x) is again given by either (2.6.1) or (2.6.5), so that we 
may write 


fx) = Σ bfx) + E(x) (3.2.14) 


where 


E(x) = n(x)f[xo,..-5 χ,» Χ] = πα) ΓΘ (3.2.15) 


and where, as before, ¢ is some number in the interval J spanned by Xo; 
Ni pneeg hs AN X. 

To illustrate the use of the lagrangian formula, we may write down the 
interpolation polynomial of degree 3 relevant to the data 


x —] 0 1 2 
70) 1 1 1 —5 
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in the form 
(x — 0)γυ — 1) — 2) 1 (x + 1)(x — 1)(x — 2) 
(-—1 -- 0-1 — 1)(-1 -- 2) (0 + 1)0 — 1) — 2) 
1 (x + I(x — Ox -- 2 ,(α Ὁ ἴχὰ — OO -- 1) 
(d+ Hd -- Oi -- 2) (2 + 10 -- 02 — 1) 
—¢x(x — 1)(x — 2) + 4x + I(x -- YD& -- 2) 
— 1 + 1)x(x — 2) -- Bx + I)x(x -- 1) 


et 


which may be reduced to 
y= πχ ἘΧΈΙ 


For the purpose of actual numerical interpolation, the reduction to this final 
form would not be necessary. | 

On the other hand, whereas the newtonian method would require the 
formation of the divided-difference table 


-1 1 
0 
0 0 
1 1 -3 
—6 
2 |-5 


the use of the indicated difference path would involve only the following 
calculation: 


= 1 + x0) + x(x — 10) + χὰ — ῦχ + I(-D 
l—xx—-D(xtHD=-~4+x4+1 


< 
| 


The lagrangian form of the interpolation formula f(x) ~ y(x) possesses 
the advantage that its use does not involve preliminary differencing of data. 
However, it has the disadvantage that, unless f(x) is given analytically, so that 
use may be made of the second form of (3.2.15), it is difficult to estimate the 
truncation error relevant to the result afforded by interpolation based on a given 
number of ordinates, or to estimate the number of ordinates needed to reduce 
the truncation error below prescribed limits. If the newtonian formula is 
used, a more or less dependable estimate of accuracy, based essentially on the 
first form of (3.2.15), may be obtained by sampling the first neglected higher- 
order difference. 

Furthermore, in order to improve a certain result by taking into account 
one or more additional ordinates, the coefficient functions /,(x) would have to 
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be completely redetermined in the lagrangian procedure, whereas the newtonian 
procedure would require merely the formation of a higher-order difference, and 
the addition of a multiple of that difference to the previously calculated result. 

On the other hand, the lagrangian form is much better adapted to the 
analysis of the effects of inherent errors in the data. Thus, if the original data 
were all correctly rounded to r decimal places, so that the maximum error in each 
given ordinate is 5 x 10°’~', it is seen that the largest possible corresponding 
error in the interpolation for f(x) would be 


RG)Imax = (5X 10-1) > 0] (1210) 


The corresponding calculation based on the newtonian form would be more 
complicated but would, of course, lead to an equivalent result. In addition to 
this error, the errors due to truncation and to intermediate roundoffs must be 
taken into account in either case. 


3.3 Numerical Differentiation and Integration 


Once an interpolation polynomial y(x) has been determined so that it satis- 
factorily approximates a given function f(x) over a certain interval J, it may be 
hoped that the result of differentiating y(x), or of integrating it over an interval, 
will also satisfactorily approximate the corresponding derivative or integral 
of f(x). However, if we visualize a curve, representing an approximating 
function and oscillating about the curve representing the function approximated, 
we may anticipate the fact that, even though the deviation between y(x) and 
f(x) be small throughout an interval, still the slopes of the two curves represent- 
ing them may differ quite appreciably. Further, it is seen that roundoff errors 
(or errors of observation) of alternating sign in consecutive ordinates could 
affect the calculation of the derivative quite strongly if those ordinates were 
fairly closely spaced. 

On the other hand, since integration is essentially a smoothing process, 
it would be anticipated that the error associated with integration may be 
small even though the interpolation polynomial itself provides only a mod- 
erately good approximation to f(x). 

These expectations are borne out in practice. In particular, numerical 
differentiation should be avoided wherever possible, particularly when the data 
are empirical and subject to appreciable errors of observation. When such a 
calculation must be made, it is desirable first to smooth the data to a certain 
extent. Certain methods of effecting such a smoothing are considered in 
Sec. 7.15. 
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From the lagrangian approximation 


fo) & > WOO) 033.) 


with associated error 


- @ 
E(x) = WO) ey) ry! (3.3.2) 


we obtain the corresponding integral formula 


[ " fe) dx Ὁ >, Cf) (8.33) 
where the weighting coefficients C,, are given by 
C, = [ [((Χ) dx (3.3.4) 
and where the associated error can be expressed in the form 


1 


a il Ι n(x) f"* P(E) dx (3.3.5) 


or in a related form involving a divided difference. 

With regard to (3.3.5), it should be remembered that ¢ depends in a specific, 
but generally unknown, way upon x, so that even though the (7 + 1)th derivative 
of f were known analytically, generally it would be impossible to evaluate the 
integral defining E exactly. However, if it is known that |f@*(x)| < M 
on J, where J is limited by the largest and smallest of xo, x;,..., X,, a, and 5, 
and where M,,, is a constant, it may be deduced that 


E| < = m(x)| dx 3.3.6 
ls Met | inode 636 
Further, in those cases where none of the abscissas x; lie in (a, δ), the function 
n(x) does not change sign in [a, b] and the second law of the mean (Sec. 1.9) 
may be invoked to show that 


- re ab 1 wal n(x) dx (3.3.7) 


where & is some number in 17 This last situation exists, in particular, in the 


+ The symbol € is used frequently in a generic sense to indicate an argument known 
only to lie in a certain interval, and often it will have different interpretations in 
different (possibly related) equations [as in (3.3.5) and (3.3.7)] when the resultant 
ambiguity is not believed to be misleading. A similar comment applies to the use 
of the symbol E to designate an error term. 
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frequently occurring cases when the integration is carried out over the interval 
between two adjacent tabular points. 
Similarly, by differentiating (3.3.1) r times, one obtains the approximation 


FOR) & 2, L(x) F(x) (3.3.8) 
with the associated error 
1 
(n + 1)! a 4 
However, since the dependence of ἔ upon x is again unknown, the differentiation 
in (3.3.9) cannot be explicitly effected. 
In order to obtain a somewhat more tractable form of the remainder, 


we replace (3.3.2) by the equivalent first form of (3.2.15), which involves the 
current variable x itself. The error (3.3.9) then can be expressed in the form 


E(x) = © tn )FO"O] (8.39) 


E(x) = 5 fn) flix... κρ XJ} 04.3.10) 


If use is made of the Leibnitz formula for the rth derivative of a product, 


=u Dv + rDu D}py + 7 pr ὄν e+) + Duv 


= Σ ¢ D'u D’~‘y (3.3.11) 
i=o \! 


where D = d/dx and where ( A represents the binomial coefficient 
a. eee aoe ! 
᾿ _ rr -- 1) Ὁ i+ 1) = = (3.3.12) 
i i! (r — i)! i! 


Equation (3.3.10) takes the form 


E(x) = Σ ΠῚ n(x) 4 I = ie. ee te a 


i=0 


or, making use of (2.3.9), 


r ! r—it+1 times 
E(x) = > -- R(X) f [X05 --+5 Xm χετν χ] 1353:13) 


i=0 
A generalization of the relation (2.6.8) leads to the fact that 


m times 


(n+m) 
Rhee oe heres 3s ἘΞ ae = —_————_ f (Cn) (3.3.14) 
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where, for given n, €,, lies somewhere in the interval J limited by the largest 
and smallest of x9,..., xX, and x. Hence, finally, (3.3.13) can be expressed in 
the form 


r 


r! 


E(x) — ----.-.-.-.ο-..Ψ.ς.ς-ς-ς-ς-ς-.-Ἕ--ς-ς-ς-ς- 
n(nt+r—it iti! 


ROX) FETE) — (3.3.15) 
where each of the r + 1 numbers €,..., ἔ, lies in 1. 

The expression for the error is thus rather complicated in the general 
case, and when the rth derivative is calculated by differentiating an interpolation 
polynomial of nth degree, the estimation of the error may involve the estimation 
of derivatives of f(x) of ordersn + 1jn + 2,...,2 + 7r,andn + r+ 1inthe 
interval J. (See, however, Prob. 10.) 

It may be noticed that when r > n the right-hand member of (3.3.8) 
vanishes identically, since /,(x) is a polynomial of degree n. Generally, at best 
only derivatives of order r for which r is small relative to m are given with any 
significant accuracy by this formula. 

In the case r = 1, the formula (3.3.8) becomes 


f(x) ® 2, of) (3.3.16) 
and the associated error, as given by (3.3.15), is of the form 


cra = Mea)! f°" PE) 
EG) - ποῦ τ ΔΙ + a) Co (3.3.17) 


where both €, and €, lie in the interval 1. In particular, for numerical differentia- 
tion at a tabular point, there follows 


f(a) = > KOS) + ποι ye το 4,319 
since n(x) vanishes when x = x;, where the factor z’(x;) has the simple form 


π(χὺ = (Xj — Χο)" (αι — Xn) (3.3.19) 
in accordance with (3.2.11). 7 
It is seen that this relation is the result which would be obtained by 
differentiating the formula 


aes 3) 
(n + 1)! 


with respect to x, overlooking the fact that ¢ is a function of x, setting x = x; 
in the result, and changing € to a new parameter ¢,. Except for the calculation 


fe) = Σ πολλοῦ + ποὺ 
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of the first derivative at a tabular point, (3.3.15) indicates that this procedure 
generally would not yield the correct expression for the error term. 

It can be shown, however, that the error E(x) can indeed be expressed 
in the analogous form 


ΕΠ = πο) L OM (3.320) 
(n + 1)! 
for any positive integer r, where ἡ, is somewhere in J, ἘᾺΝ x is outside or at 
one end of the span of the tabular values Xo, ... , χ, (see Prob. 8). 


3.4 Uniform-spacing Interpolation 
Since the coefficient function /,(x) can be expressed in the form 


I(x) = (xX = χορ) -- χε)". -- XR = χα) -- χρῇ) (3.4.1) 
(X; — χο)ίχι — χα)" ὦ, -- χα); -- Xins) °° (Xj -- χρ) 


it is seen that the form of /,(x) is invariant under any linear change in variables 


x=arths xX, =aths, (3.4.2) 
where a and ἢ are constants: 


(x) is (s τ᾿ Sos i 51) ων» (5 _ 51--.)(5 ΝΗ 51.1) τ “(5 "" Sn) (3.4.3) 
(5; — So)(S; — 51) °° (6; — 8;-4)(8; — 5.1}. (Si — Sy) 


It is often desirable to choose a and h in such a way that the dimensionless 
variable s, which measures distance from a in units of h, takes on convenient 
values at the tabular points used in any specific interpolation. For equally spaced 
abscissas, h is conveniently identified with the spacing. In the cases when the 
abscissas are uniformly spaced, the lagrangian coefficient functions have been 
tabulated rather extensively for various values of n. Formulas involving an 
odd number of ordinates are most often used, and, if that number isn + 1 = 
2m + 1, the abscissas are then conventionally renumbered as X pang as 
χοὸς ee 

If the uniform spacing x,,, — x, is denoted by A, and if s is measured 
from the central point, so that 


xXx = Xo + hs xX; = Xo a hk (3.4.4) 
Equation (3.4.3) then reduces to 


Lx) = 5 Ὁ MG +m ~ 1-6 -- i+ 16 -- ἰ -- 1): Ὁ -- m+ ἡ εξ ὴ 
(i + mii + m — 1) -- (2) )(-- 1)(-- 2) - - - (i — m+ DG -- πὸ 
= 1,5) 
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Thus 
L,(s) = ἀξ sc as σοΞ (3.4.5) 
(m!) 
and 
L(s) = (—1)'**s(s + ἢ 
(m + i)!(m — i)! 
x ΤᾺ — 52,4 —s2)-+- (i — 1? — 82) + 12 — 8?) ++ (m? — 8”) ] 
(3.4.6) 


ford ΞΞ 2 ane EM: 

In illustration, Table 3.1 presents exact values of the lagrangian coefficients 
for three-point (quadratic) interpolation to tenths, corresponding to m = 1 
(a corresponding five-point table is included in Sec. 4.12): 


Table 3.1 

δ' 1,.-.6) 106) 1.06) 

0.0 0 1 0 0.0 
0.1 — 0.045 0.99 0.055 —0.1 
0.2 — 0.08 0.96 0.12 —0.2 
0.3 — 0.105 0.91 0.195 “- 0.3 
0.4 —0.12 0.84 0.28 —0.4 
0.5 —0.125 0.75 0.375 —0.5 
0.6 —0.12 0.64 0.48 — 0.6 
0.7 — 0.105 0.51 0.595 —().7 
0.8 — 0.08 0.36 0.72 —0.8 
0.9 — 0.045 0.19 0.855 —0.9 
1.0 0 0 1 —1.0 


|e ea aa a ac (a aS 


1.6) 106) 1Ι..6) 5 


From (3.4.6) it follows that L,(—s) = 1..- (5). This explains the fact that, for 
negative values of s, to be read from the right-hand margin, the column labels 
at the foot of the table are to be used. 

Thus, for example, to interpolate the data 


x 1.00 1.10 1.20 1.30 


F(x) 0.8415 0.8912 0.9320 0.9636 


for f(1.24) by use of a three-point formula, the work would be centered at the 
nearest tabular point, x = 1.20. Withs = 0.04/0.10 = 0.4, and with coefficients 
read from the preceding table, there would follow 


f(1.24) = (—0.12)(0.8912) + (0.84)(0.9320) + (0.28)(0.9636) 
— 0.945744 = 0.9457 
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To interpolate for x = 1.02, the work would be centered at x = 1.10. With 
s = —0.8, there would follow 


F (1.02) % (0.72)(0.8415) + (0.36)(0.8912) + (—0.08)(0.9320) 
= 0.852152 = 0.8522 


The given data correspond to rounded values of f(x) = sin x, and the results 
thus correspond to the tabulated five-place values sin 1.24 -- 0.94578 and 
sin 1.02 = 0.85211. 

Extensive tables of Lagrange coefficient functions, and of certain of their 
derivatives, may be found in the literature (see Sec. 3.12). 


3.5 Newton-Cotes Integration Formulas 


In order to obtain formulas for the approximate evaluation of an integral of the 
form (? J (x) dx, where a and ὃ are finite, we may first introduce the change of 
variables 


x =at δ (3.5.1) 
where n is an integer, to obtain the relation 
b b—af{" 
I(x) dx = F(s) ds (3.5.2) 
a 0 
where 
F(s) = γ{. ieee : (3.5.3) 
n 


If now it is assumed that f(x) can be approximated over [a, b] by the 
polynomial which agrees with it at, say, n + 1 equally spaced points in [a, δ] 
we may obtain the approximate formula 


{, F(s) ds = Σ C,F (k) (3.5.4) 
0 k=0 


where 
a= |" s(s — 1)---(s —~k + 1s — k — 1) :ττ(6 — n) ds (3.55) 
Jo Kk -- De (kK-~k+ DK -κ- ἢἣ---(- η) ~ 


In accordance with (2.5.3), the error term omitted on the right in (3.5.4) 
can be expressed in the form 


z= | s(s — 1)++-(s — n)F[0, 1,..., n, s] ds (3.5.6) 
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Since the coefficient of the divided difference of F is not of constant sign in 
[0, x] unless n = 1, the second law of the mean cannot be applied directly. 
However, it is possible to prove (see Sec. 5.12 and Steffensen [1950]) that 
when n is odd, the error can be expressed in the form which would be obtained 
if this procedure were valid: 


= ἐστ) ( 
"(n+ 1)} Jo 

whereas, when n is even, the error can be expressed in the form 
a F +2) 
ΚΤ (n+ 2)! 
where 0 < o < ἡ in each case. 


If, as before, we write ἢ = (ὁ — a)/n and x; = a + hi, the result estab- 
lished can be put in the more explicit form 


s(s — Ἡ 1) τ (Ἃ — n) ds (n odd) = (3.5.7a) 


i (; - s(s -- 1):--(s -- n) ds (neven) (3.5.7b) 


0 


[ὦ dx xh = Cif (x) (3.5.8) 


where, for a given value of ἡ, C, is defined by (3.5.5). By noticing that 


d’ 4" 
F(s) = h" x 
F; (s) 7 70) 
and that, from (3.5.2), the error in (3.5.8) is hE,, we obtain also the expressions 
n+2¢(n+1) n 
B= he) s(s -- 1) "(5 — n) ds (n odd) (3.5.9a) 
(n+ 1)! Jo 
and 
n+3¢(n+2) n 
E, = πο 10 eee s(s — 1)-::(s — n)ds (neven) (3.5.9b) 
(n + 2)! 0 2 


where Xp) < € < x, in each case. 
In illustration, we consider the case n = 2. Here there follows, from 
(3.5.5), 


PCED Dyce! Gy 
= Ι REDE ΠΕ 


and (3.5.95) gives 
E, = hf" " s(s -- 0) 6 -- 2) ds = -- h°f'™(6) 
24 a 


90 
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The corresponding formula (3.5.8), with the error term, then takes the form 


x2 : 5 ὰ 
| Ποὺ dx =F fo τὰν Ὁ) ἢ PO ὦ <b <x) 


This is the celebrated formula of Simpson’s rule. 
In a similar way, the following formulas may be obtained: 


ἯΙ ἘΞ, h — μ᾽ {id 
[ Ποὺ dx -- 2h τ) - ἘΚ 


| ” 4x) dx 
{ ” 4x) dx 
| ὼ F(x) dx 


[ ” f(x) dx 
| ** f(x) dx 


| $x) dx 


| ore 


hin sap efy—-@ fe 

10 PEST 90 

3h 3.5 Ὁ 
3 (99 + 3f, Ἐ32 9} - 30 7. 


2h 8h" evi 
ἃς (70 a Ὁ 125 PO) -- τὰς Γ ὦ 


ΕΝ (90 + 75f, + 50, + 50fs + 75, + 197) 
275h" 
120967 “" 


= (41fy + 216f, + 27, + 272, + 276, 
+ 216f; + 41f;) — ede γι 
᾿ δ 1400 


ΠΡ (151fy + 3577, + 1323, + 2987, 


17280 
+ 2989f, + 1323f5 + 3577f6 + 7510)) 


8183h? i 
5184007 ) 


4h _ (gop, + 5888/, — 928f, + 10496f, — 4540f, 


14175 
+ 10496f; — 928f; + 5888, + 9897.) 


2368} ςς 
467775 ἀς 


(3.5.10) 
(3.5.11) 
(3.5.12) 


(3.5.13) 


(3.5.14) 


(3.5.15) 


(3.5.16) 


(3.5.17) 


An inspection of the error terms reveals that a formula involving an odd 
number 7 + 1 = 2m + 1 of points would yield exact results if f(x) were a 
polynomial of degree n + 1 or less, whereas one involving an even number 
n + 1 = 2m of points would be exact only if f(x) were a polynomial of degree 
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n or less. Thus the two formulas involving 2m and 2m — 1 ordinates have the 
same order of accuracy, so that generally no great advantage is gained by 
advancing from a formula involving an odd number of ordinates to one involv- 
ing one more ordinate. In particular, the error in Simpson’s rule (3.5.11) is 
given by —A°fi%(E,)/90, and that in Newton’s rule (3.5.12) by —3h°f V(E,)/80, 
where both €, and €, are in (a, δ). In comparing these errors, when both 
formulas are applied to the evaluation of the same integral, we must notice that 
h = (b — a)/2 in the former case, whereas h = (ὁ — a)/3 in the latter. Hence 
the coefficient of —(b — οὐ is zAsq in Simpson’s rule and <5, in Newton’s 
rule. Thus the latter (which involves one extra ordinate) may be expected to 
be only slightly more accurate than the former, on the average. Clearly, the 
advantage may be shifted in either direction if f'"(x) varies strongly over [a, δ], 
so that f'(é,) and f‘(é,) may differ appreciably, or if ΧΑ) fails to exist or is 
discontinuous somewhere in (a, δ), so that the error formulas are invalid. 

Another useful set of integration formulas is obtained by dividing the 
interval [a, b], as before, into n equal parts by inserting n — 1 equally spaced 
interior abscissas, then approximating f(x) by the polynomial of degree n — 2 
which coincides with f(x) at the n — 1 interior points, and approximating the 
relevant integral by integrating the resultant polynomial over [a,b]. These 
formulas thus do not involve the ordinates at the ends of the interval and are 
said to be of open type, whereas those previously considered are said to be of 
closed type. The first few such formulas (n = 2,..., 6) may be expressed as 
follows: 


| "Fee One ὩΣ f" (3.5.18) 
_ 4 
f(x) dx = = Cease “ fr" (3.5.19) 
"ΠΣ 5. 
| fo) dx = “Men - f+ 2) + EO (3.5.20) 
᾿ 5 
f(x) dx = = ee ee eee Ὁ δ" f(Q 68.520 


| a aie = (11f, — 14f) + 26f, -- 14f, + 117) 


4th’ ,. 
+f" 6.522 


It is sometimes more convenient to write (3.5.18) in the form 


[τὸ ἢν ἐξ ee Ὁ κῷ (3.5.23) 
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where fi 2 = f(4X9 + 4x,), in which form it is often called the midpoint 
formula, (See also Prob. 28 for another family of integration formulas of which 
it is a member.) 

The formulas of the type considered in this section are generally known 
as the Newton-Cotes (or Cotes) formulas. Apart from the midpoint formula 
(3.5.23), the formulas of open type are principally of use in the numerical 
integration of differential equations. In addition, however, they are needed in 
special cases where the values of f(x) at the end points of the interval of integra- 
tion are unavailable, and they tend to be somewhat preferable to comparable 
closed formulas when a derivative of f(x) is singular at one or both of the end 
points (see Sec. 5.10). 

Since all the integration formulas considered in this chapter must, in 
particular, be exact if f(x) is a constant, it follows that the sum of the weighting 
coefficients in any formula must equal the length of the interval of integration. 
Thus, for example, that sum in (3.5.13) is 2h-90 = 4h = b — a. 


3.6 Composite Integration Formulas 


In place of using a single polynomial to approximate J (x) over the complete 
integration interval [a, δ], it is clearly possible to divide [a, b] into subintervals 
and to approximate f(x) by a different polynomial over each subinterval. Thus, 
for example, by applying the two-point formula (3.5.10) to m successive sub- 
intervals of length h, one obtains the so-called trapezoidal rule: 


b 3 
| fix) dx = WA fo + Ae + a ho Sona tens + A) —™ pre 
; (3.6.1) 


where fo = f(a), f, = f(a + kh), and f, = (6), and where now € is some- 
where in (a, δ). This formula corresponds to replacing the graph of f(x) by 
the result of joining the ends of adjacent ordinates by line segments and is of 
remarkable simplicity. Whereas it is not of high accuracy, we may notice that, 
since here h = (b — a)/n, the error can be written in the form 
| _ Ὁ - ay Te) ee ee  χὴ 
E, = aaa ΓΤ = aa hfs) 6562) 
Hence, if only f”(x) is continuous (and hence bounded) on [a, b], the error 
will indeed tend to zero like 1/n? as n + oo (and like h? as ἡ - 0). 
As will be seen, the accuracy afforded by a k-point Newton-Cotes formula 
does not necessarily increase as k increases, and, in fact, the accuracy may 
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become worse and worse after a certain stage, even though f(x) possess con- 
tinuous derivatives of all orders for all real values of x, and even though no 
roundoff errors be introduced. In such cases, unless the desired accuracy is 
attained before this stage is attained, the use of an alternative such as a composite 
(or “compound’’) rule is essential as well as convenient. 

Another advantage of the trapezoidal rule consists of the fact that the 
weighting coefficients are nearly equal to each other. For it is easily seen that, 
ifm + 1 ordinates are each liable to random errors of observation (or roundoff), 
the RMS error in a linear combination of these ordinates, for which the sum of 
the constants of combination is fixed (here equal to b — a), is least when the 
constants of combination are equal. Newton-Cotes formulas of the open type 
are particularly objectionable from this point of view, since, for n = 4 and 
n = 6, their coefficients actually fluctuate in sign. Similar sign fluctuations also 
occur in formulas of closed type for n = 8 and forn 2 10. 

If the midpoint formula (3.5.23) is applied to each of the » subintervals 
used in deriving (3.6.1), one obtains instead the repeated midpoint rule: 


b 3 
| fQ) dx = Mya that + heumt 2PO 663) 


where f,.4(1/2) = f[a + (Κ + 4A]. This formula results from approximating 
f(x) by a piecewise-constant step function, with a jump f,4(1/2) — Se-(1/2) at 
each point x, = a + kh. For a given value of n, the magnitude of its error 
tends (on the average) to be about half that associated with the trapezoidal 
formula (when / is twice differentiable) in spite of its using one less ordinate. 
Additional advantages follow from the equality of its weighting coefficients. 
(An offsetting computational drawback is pointed out in the next section.) 

By dividing the interval [a, b] into n/2 subintervals of length 2h, where 
n is an even integer, and applying Simpson’s rule to each subinterval [that is, 
by approximating the graph of f(x) by a parabola in each subinterval], the so- 
called parabolic rule is obtained in the form 


| feo a - τοῦ + 4 + 2f, + 4fg too + 


5 
4f,3 + Sia + Sith) -- τῆς "δ 6869 


where, again, fy) = f(a), f, = f(a + kh), and f, = f(b). Here we have 


a (b = a)” iv 
C—2f") 6.65) 


so that, if ΧΑ) is continuous on [a, b], the error associated with the use of 
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n + 1 ordinates tends to zero like 1/n* as n > ©, and like h* ash — 0. Thus 
the parabolic rule usually is more accurate than the trapezoidal and repeated 
midpoint rules when n is sufficiently large. Since also its weighting coefficients 
are simple and do not fluctuate unduly in magnitude, it is perhaps the most 
widely used single formula for numerical integration. However, it can be used 
only when ordinates are available which divide [a, b] into an even number of 
intervals of equal length h; also, in cases where a high degree of accuracy is 
required, a prohibitively large number of such ordinates may be needed. 

It may be noted that the trapezoidal, repeated midpoint, and parabolic 
rules each approximate the required integral J by an associated Riemann sum 
(see Prob. 29), so that each of these approximations assuredly converges to I 
as the spacing ἢ tends to zero if only J exists in the usual (Riemann) sense 
whether or not f(x) is sufficiently differentiable to ensure the respective rates 
of convergence to which reference was made above. 

It is useful to notice that when f”(x) is of constant sign in [α, 87, the true 
value of J is between the trapezoidal and repeated midpoint approximations 
based on a common spacing ἢ. Also, if these approximations are denoted by 
I,(h) and J,,(A) in the general case, there follows 


Ip (5) Ξ = Lath) + yh] (3.6.6) 


where [,(h/2) is the parabolic-rule approximation based on a halved spacing. 

The use of such formulas in two-dimensional integration over a rectangle 
is illustrated in Probs. 40 and 41. Other integration formulas are considered 
in Chaps. 5 and 8 and in Sec. 9.13. 


3.7 Use of Integration Formulas 


In order to illustrate the preceding formulas in a simple case, we consider first 
the numerical evaluation of the integral 


1 dx 
= log 2 = 0.69314718 --- (3.7.1) 
o l+x 
With f(x) = 1/1. + x), there follows also 
—1)*k! 
yy) — 5A 
f ( ) (1 a xyes 


and hence 


21k < (-DF®R) «χα' O<x <1) 
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Thus, for example, if use were to be made of the five-point formula (3.5.13), 
with h = 0.25, the upper and lower bounds 


0.000002 < —E; < 0.0004 


would be available with regard to the truncation error. Since f(x) is positive 
on [0, 1] when k is even, it follows that each error term will be negative. 

The following table of upper bounds on the magnitude of the possible 
error relevant to the (closed) Newton-Cotes (NC), trapezoidal (T), and parabolic 
(P) rules, in the present case, is easily determined: 


Ordinates NC T Pp 
3 9 x 1073 5 x 10-2 9 x 10-3 
5 4x 1074 2 107? 6 x 10-4 
7 3 x 10-° 5 x 10-3 2x 107* 
9 2 x 107° 3 x 10-3 4x 10-5 


More generally, it can be predicted in this case that the magnitude of the 
error involved in the use of nm + 1 ordinates (where n is even) will be between 
1/240n* and 2/15n* for the parabolic rule, between 1/48n” and 1/6n* for the 
trapezoidal rule, and between 1/96(n + 1)? and 1/12(” + 1)’ for the repeated 
midpoint rule. Such preliminary upper bounds may be quite conservative, but 
it is difficult to obtain more precise ones. 

In addition to the truncation errors, one must consider the effect of round- 
off in the values of the ordinates used in the calculation. If each ordinate is 
rounded correctly to r decimal places, so that the maximum error in each ordinate 
is not greater than 5 x 107" ΄, the maximum corresponding error in the final 
calculation is therefore not greater than 5 x 10°'~* times the sum of the 
absolute values of the relevant weighting coefficients. If those coefficients are all 
positive, this last sum must equal the length of the interval of integration (here 
unity). Thus, in the present case, if all weighting coefficients are positive, the 
error in the final result, due to inaccuracies in the original data, cannot exceed 
the maximum of those inaccuracies. This situation prevails in all the formulas 
considered in the preceding tabulation except the Newton-Cotes nine-point 
formula, in which a magnification factor of $4443 ~ 1.5 would be involved. 
Whereas a considerable amount of cancellation in the errors of roundoffs 
would be expected (particularly if the weighting coefficients are nearly equal), 
it cannot be guaranteed in any particular case. 

Suppose that the ordinates used in the present calculation are to be rounded 
to r decimal places, and that the final result is to be in error by less than one unit 
in the rth decimal place. If the parabolic rule is to be used, withn + 1 ordinates, 
an even integer n must then be determined such that 
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<5~»x 10°7""! or n > 0.72 x 1074 


15n* 


The total error, due to truncation and initial roundoff, then could not exceed 
107", under the assumption that intermediate roundoffs are absent or negligible. 
For r = 4, this condition gives n > 7.2, so that nine ordinates would do; 
for r = 5, 13 ordinates would suffice. If the trapezoidal rule were used, 
the need for about 58 ordinates would be indicated for guaranteed four- 
place accuracy. Reference to the preceding table shows that a Newton-Cotes 
formula using nine ordinates would lead to a result in error by lessthan2 x 107° 
due to truncation. If the ordinates were rounded to five places, the effect of that 
roundoff here could be as large as 8 x 107°. Thus the final error could not 
exceed 107°. 

Actual calculation, with the ordinates rounded to five places, shows that 
the error associated with Simpson’s rule (three ordinates) is smaller than 
2 x 107%, with the five-ordinate Newton-Cotes formula less than 3 x 107°, 
with the five-ordinate trapezoidal rule less than 4 x 10-3, with the four- 
ordinate midpoint rule less than 2 x 107%, and with the five-ordinate parabolic 
rule less than 1077, The fact that some of the error predictions were quite 
conservative is a consequence of the variation of the higher derivatives of I(x) 
over [0, 1]. Thus, for example, the error estimate for the five-ordinate Newton- 
Cotes formula assigned the maximum value 720 to f “(x) on [0, 1], whereas 
all values from 44° to 720 are taken on. A value of about 56 would have given the 
proper estimate. 

In those cases where f(x) is given empirically or, more generally, in such a 
form that information with regard to bounds on higher derivatives of f(x) is 
not readily accessible, less dependable error estimates may be based on the 
calculation of one or more divided differences of order equal to that of the 
derivative involved in the error estimate. 


3.8 Richardson Extrapolation. Romberg Integration 


Another method of estimating (or partially removing) the truncation error 
in an integration formula is frequently useful. Suppose that the error in a 
certain formula approximating 
ὃ 
I= Ϊ F(x) dx 
can be expressed in the form 
E = Ch'f™() (3.8.1) 


where C is independent of A, r is a positive integer, and & is known only to lie 
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in (a, b). Let two calculations be made—one with spacing h, and one with 
spacing ,—and denote the corresponding approximations to the true integral 
I by J, and J,, respectively. 
Then, if only truncation errors are considered, there follows 

T—T, = οὐκ 

[-h= Chi f (Es) 
The assumption that f(é,) and f(€,) can be (nearly) identified then leads to 
the extrapolation formula 


Ix hil, — Maly =I, + eee (3.8.2) 
μὴ — hy (h,/h2)y -- 1 
In particular, if h, = h,/2 there follows 
211, -- 1 I, -- 1 
Te Ξ:2-- -ι--1,--3--.-: h, = 2h 3.8.3 
taht Zt (ὦ, π 2.) B83) 


This process, often known as Richardson extrapolation, generally will be 
effective if f(x) does not vary rapidly and does not change sign in [a, δ]. 
Usually it may be used with some confidence, in any case, if the correction to be 
added to J, is small relative to J, itself and if successive approximations appear 
to be approaching a limit from one side without oscillation. 

In the cases of the trapezoidal and repeated midpoint rules, for which 
r = 2, the divisor in (3.8.3) is 3, whereas for the parabolic rule, with r = 4, the 
divisor is 15. For example, in the case of the preceding numerical example, 
the parabolic rule yields the approximation J ~ 0.694444 with h = 4 and the 
approximation 7 ~ 0.693254 with h = 4. Use of the extrapolation formula 
(3.8.3), with r = 4, gives 

I = 0.693254 — 0.000079 = 0.693175 


Thus the error in J, may be estimated as about — 0.00008, and the value 0.69317 
may be expected to be correct within perhaps one or two units in its last place, 
as is indeed the case. When oscillation of the sequence of successive approxima- 
tions is present, this procedure may be completely undependable, as will be 
illustrated in Sec. 3.9. 

For the trapezoidal and parabolic rules, the process of repeatedly halving 
the spacing ἢ is particularly convenient since all ordinates needed in a particular 


+ A procedure of this general type, in which two calculations are made, with errors of 
the respective forms hi, ¢(/41) and A) ¢(h2), where ¢(h) is an incompletely known 
function of the spacing A, and in which an extrapolation to h = 0 is made under the 
assumption that ¢(A) is nearly independent of 4, is also often known as Richardson’s 
deferred approach to the limit. (See Richardson and Gaunt [1927].) 
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calculation would be used again in the next one and in all following ones. The 
repeated midpoint rule does not have this useful property since, in fact, no 
ordinate would be used more than once. 

In order to treat a useful generalization of this procedure, we note next 
that (as will be shown in Sec. 5.8) the truncation error associated with the 
trapezoidal rule can be expressed in the form 


E = C,h? + Cyh* +--- 


ΖΝ τα 2n+2 Boyi2 (2N+2) 

+ Cyh (ὁ — ajh ΩΝ +2! ΠΕΡῚ 7 (θῶ (3.8.4) 
where the C’s are certain constants which depend upon / but are independent of 
h, and where By, is a Bernoulli number defined in Sec. 5.8. Here the integer 
N can be taken to be arbitrarily large, assuming only that f(x) has a (2N + 2)th 
derivative in [a, δ]. This form reduces to (3.6.2) when N is taken to be zero 
since B, = 1, 

If we now denote by ΤΊ.) the trapezoidal approximation to J with spacing 
h, = (ὁ — a)/2*, then the Richardson extrapolate based on T and 7,©, is 
[2°72 — ΤΙ] Ω2 — 1), according to (3.8.2). If this number is denoted by 
Τί), then the derivation of (3.8.2) shows that its error, when written in a form 
analogous to (3.8.4), will begin with an h* term, the h? term having been 
eliminated. 

If then we denote the Richardson extrapolate based on T and 7), by 
Ti”, it follows that T? = [24*7,2, — T@]/(2* — 1) has an error of order 
he, when h,, is small, and this process can be iterated at pleasure if f(x) is 
sufficiently differentiable. 

In general, if we define 


mr(m—-1) __ (m—-1) 
To” = πῇ τ πθ΄ (m=1,2,...) (8.8.5) 


then the approximation Tj” has an error of order h2?”*?. In fact, it is known 
(see Bauer, Rutishauser, and Stiefel [1963]) that 


re h2™+2B “ (2m+ 2) 
1 -- Τί )- (—1) ΤΙ = 0) ere (3.8.6) 
where h, = (ὁ — a)/2*, for some € in (a, δ). 
By recursive use of (3.8.5), we may obtain the following triangular array 
of approximations to the required integral 7: 
To 
TO 7 
TS Ty» TY? 
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Here the successive entries in the first column are the trapezoidal-rule ap- 
proximations, with h, = (b — a)/2* (k = 0,1, 2,...), and the remaining 
entries can be determined recursively from them. 

It can be verified that the entry Τί) is the approximation that would be 
obtained directly by using Simpson’s rule 2* times, and also that T{” could be 
obtained directly by using the Newton-Cotes five-point formula (3.5.13) 2" 
times. No similar interpretation of Tf” in terms of Newton-Cotes formulas is 
possible for m = 3. 

The process of estimating the required integral by use of the array of 
approximations ΤΙ is called Romberg integration (Romberg [1955]). It 
has already been noted that the sequences of elements in the first two columns 
of the Romberg array converge to the desired integral /. In addition, however, 
it is known that in fact the sequences of elements in a// columns converge to the 
same limit if f(x) has derivatives of all orders on [a, δ], the rate of convergence 
generally (but not always) increasing with m. The sequence of diagonal elements 
Τί) TS, ... also converges, in such cases, to the desired value and is usually 
the preferred one. 

It is shown in Sec. 5.8 that the error associated with the repeated midpoint 
rule also can be expressed in a form analogous to (3.8.4), so that a completely 
similar array of approximations 


can be associated with it, where 


Mo” = (3.8.7) 
Here M{° is the result of applying the midpoint rule 2* times, with the spacing 
με = (b — a)/2. 

The abscissas involved in the calculation of M°, are the new abscissas 
introduced in the determination of T°, and, in fact, it is easily seen that 

TO = 4[1(9) + MQ, (3.8.8) 

Thus, if the midpoint-rule approximations Mf, M\, MS”,... are calculated 
first, then, after Τί) is calculated, the successive trapezoidal-rule approximations 
Τί T,... can be determined recursively by use of (3.8.8) in a very simple 
way. 

More generally, since (3.8.5) and (3.8.7) are of the same form, a simple 
inductive argument shows that (3.8.8) generalizes to the relation 


Τοῦ = 47, + ME, | (3.8.9) 
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This relation, combined with (3.8.5) for k = 0, 


mp(m—1) πί(μι-- 1) 
Ταῦ = a ΘΟ ΤΟΝ (3.8.10) 


permits a recursive calculation of the entries of the T table from T°) and the 
entries of the M table. 

Even though the M-table entries were of no interest in themselves, this 
method of constructing the T table would be a particularly efficient one. In 
practice, the calculation frequently is terminated when the diagonal elements 
TS” and M§” agree within the prescribed error tolerance, with their mean 
value ΤῊ taken as the final approximation to 7. 

If a table of entries Pj”) were generated from parabolic-rule approxima- 
tions in correspondence with the T and M tables, it would be identical (apart 
from the effects of roundoff errors) with the result of suppressing the first 
column of the T table since P(&” = Τί" 1) 

To illustrate the use of Romberg integration, the M and T tables relevant 
to the approximation of the previously considered integral 


1 dx 
I= = log 2 = 0.69314718 --- 
9 1+x 
are presented. Six-place values of the integrand were used, and each entry 
was arbitrarily rounded to six places as it was determined throughout the 
recursive calculation. 
Mi Mf» M{»? Mi» 
0.666667 
0.685714 0.692063 


0.691220 0.693055 0.693121 
0.692660 0.693140 0.693146 0.693147 


Τί) Τί) T{» TS 


0.750000 

0.708333 0.694444 

0.697024 0.693254 0.693175 

0.694122 0.693155 0.693148 0.693148 


3.9 Asymptotic Behavior of Newton-Cotes Formulas 


This section presents some additional results which relate to the choice between 
the use of a single Newton-Cotes formula over an entire range of n + 1 points, 
and the use of a composite formula employing a lower-degree formula repeatedly 
over successive subdivisions, when n is large. 
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The problem consists of examining the behavior of the Newton-Cotes 
error term 


prt 2fnt VE) [ coe ds (n odd) (3.9.1) 


n+3¢(n+2) ᾿ colt s(s — 1). (5 — n) 
ἐδ ἃ (ζ) " (; 5) as) 8 TD! ds  (neven) (3.9.2) 
where n + 1 is the number of ordinates and ἔ is somewhere in the interval of 
integration [a, b]. As may be seen by an examination of the error terms given 
explicitly in Sec. 3.5, the numerical factor represented by the integral in (3.9.1) 
or (3.9.2) decreases slowly in magnitude as n increases. Indeed, it can be shown 
(see Prob. 49) that when n is sufficiently large, the integral in (3.9.1) is approx- 
imated by —2/[n(log η)2], and that in (3.9.2) by one-half that quantity. Thus, 
in either case, the numerical factor ultimately tends to zero somewhat more 
rapidly than 1/n but less rapidly than 1/n’. 
Hence, after writing h = (Ὁ — a)/n, we find thatt 


=%Ab -- a)" κατ (n odd) (39,3) 
n't 3(log n)” 
rire 
—(b ἫΝ ay 


ee ees  ὐστο (4 (n even) (3.9.4) 
n"* 4(log n)? 


asin — oO. 

If we now suppose that the function f(z), where z is a complex variable, is 
analytic in a region of the complex plane including the intervala Sx S$ b 
on the real axis, we can use the fact that then there exists a constant K such that 


FO! < Ko (a<x<b) (39.5) 


where R is the shortest distance, in the complex plane, from a point in [a, δ] 
to a singular point of f(z). (See Prob. 50.) If this relation is combined with 
(3.9.3) and (3.9.4), and if use is made of the Stirling approximation to the factorial, 


ni~J2nnn'e” (n>) (3.9.6) 


we can deduce that there exists a constant C such that 


[(b — a)/eR]’ 
El S Ca a? (ios nyt (3.9.7) 


when n is sufficiently large, whether even or odd. 


+ The notation f, ~ g, again is used to indicate that f,/g, > lasn— ©. 
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Thus it follows that a sufficient condition for convergence of the sequence of 
successive Newton-Cotes approximations is the requirement that 


b—a 


R> (3.9.8) 


That is, if @ is the region of the complex z plane bounded by line segments of 
length b — a parallel to the real segment [a, b] at a distance (b — a)/e above 
and below that segment and by semicircles of radius (b — a)/e centered at 
z = aand at z = ὦ, then the Newton-Cotes sequence of approximations to the 
specified integral 7 will converge to J if f(z) is analytic inside # and on its 
boundary. 

A sharper result replaces # by the region 2’ bounded by a certain simple 
oval (with a complicated analytical specification) having its longitudinal vertices 
at the ends of the interval [@, δ] and its lateral vertices at distances of about 
0.26(6 — a) from the midpoint of that interval. (See Chap. 4, Prob. 43, and 
Krylov [1962].) 

In the example of Eq. (3.7.1) treated in the last two sections, the only 
singularity of the complex function f(z) = 1/(1 + z) is at z = —1, and hence 
here R = ὁ — a = 1. Thus convergence is ensured. The fact that, in that case, 
relatively large values of n are needed to supply a specified degree of accuracy 
is, however, a consequence of the relative nearness of the singularity. 

Nearby singularities at nonreal points are, of course, just as troublesome 
as those which occur for real values of z. In order to illustrate this fact, we 
consider the integral 


4. dx 7 

= 2ίδη Ἶ 4 = 2.6516 (3.9.9) 
[ 41+ x? 

Here, although f(z) = 1/(1 + z7) is perfectly well behaved when z is real, it 

possesses singularities (poles) at the imaginary points z = +i. Since both these 

points lie inside both 93 and 92’, convergence is in doubt and an ultimate increase 

In error magnitude with increasing n may be feared. 

Direct calculation indicates that this undesirable situation does indeed 
exist, and that it evidences itself at a relatively early stage. The results of 
computations involving n + 1 = 3,5, 7,9, and 11 ordinates and using Newton- 
Cotes formulas over the entire range in each case are compared in the following 
table with those corresponding to the use of the parabolic rule and of the trap- 


ezoidal rule: 
n+41 NC P Τ 


3 5.490 5.490 4.235 
5 2.278 2.478 2.918 
7 3.329 2.908 2.701 
9 1.941 2.573 2.659 
1 3.596 2.695 2.6511 
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It is seen that the best of the Newton-Cotes approximations corresponds to 
the use of only five ordinates, and that the errors associated with successive 
formulas of higher order oscillate with increasing amplitude about the true 
value. 

The sequence of approximations afforded by the parabolic rule displays 
damped oscillations but is, of course, convergent. On the other hand, the 
trapezoidal-rule sequence is converging monotonically toward the true value 
αἵ ἃ rate which has not yet been exceeded by that of the parabolic-rule sequence, 
although the incorporation of additional ordinates eventually would reverse 
the advantage. 

It is important to notice that the use of Richardson extrapolation on the 
P sequence may be undependable here because of the oscillation. Thus, whereas 
it gives a good prediction with n, = 4 and n, = ὃ (the subsequence in which 
n+1= 5, 9, 13,... is monotonic), the extrapolation based on 1, = 6 and 
n, = 8 is worse than either of the approximations upon which it is based. 

The preceding example is intended, not generally to discredit the Newton- 
Cotes formulas which use a fairly large number of ordinates, but to serve as a 
warning that there exist many nonpathological situations in which their use is 
not appropriate. Such situations generally can be recognized in advance when 
f(x) is given analytically. For example, if the Taylor-series expansions of f(x) 
converge for all x [as for e~*’, Jo(x), and so forth], convergence is ensured 
(when roundoff errors are suitably controlled). 

In illustration, the following table compares successive Newton-Cotes, 
parabolic-rule, and trapezoidal-rule approximations to the integral 


6 
Ϊ sin x dx =: 0.0398297 (3.9.10) 
0 


when no use is made of special properties of the integrand: 


n+1 NC P T 
3 0.2850645 0.2850645 0.0042368 
5 0.0250938 0.0413420 0.0320657 
7 0.0405433 0.0400804 0.0364539 
9 0.0398047 0.0399047 0.0379450 
11 0.0398300 0.0398597 0.0386276 


When only tabular values of f(x) are available and no information can be 
obtained with respect to its analytical nature, the probability of a favorable 
behavior of the Newton-Cotes sequence with increasing ἡ is difficult to estimate. 
In such cases, the use of a composite formula (such as the parabolic, repeated 
midpoint, or trapezoidal rule) probably is to be preferred. If also the data are 
empirical and may possess significant inherent errors, it generally is desirable to 
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smooth the data before using them for numerical integration or for any other 
such purpose (see Sec. 7.15). | 

As previously noted, the presence of negative weighting coefficients in 
the Newton-Cotes formulas with nm = 8 and n = 10 makes the sum of the 
magnitudes of those coefficients exceed the minimum value b — a and hence 
increases the possible effects of roundoff errors in those cases. Whereas the 
magnification factor increases unboundedly as n -- oo, it has reasonably modest 
values rounding to 1.5 and 3.1, respectively, in the cases of the nine- and eleven- 
point formulas. 


3.10 Weighting Functions. Filon Integration 


A formula approximating an integral of the form 


[ ν(χ) γα) dx (3.10.1) 


a 


by a linear combination of values of f(x), rather than by a combination of values 
of the complete integrand, can be obtained, for example, by first approximating 
f(x) by a polynomial and then multiplying that polynomial by w(x) and carrying 
out the integration. As in Sec. 3.6, different polynomial approximations may be 
used over consecutive subintervals. Such formulas are desirable when w(x) 
exhibits an unfavorable behavior in [a, b], but the function f(x) itself can be 
satisfactorily approximated by a polynomial of moderate degree on that 
interval or on each of a moderate number of subintervals. Examples are pro- 
vided by the integrals 


4 x1/*F(x) dx [τ log x dx | eat dx 
0 0 -1 “ΜΙ ἐξα cas 


where, in each case, f(x) is a well-behaved function. 
In particular, such special formulas are desirable for the approximation 
of the integrals 


Ϊ ᾿ f(x) cos kx dx [ f(x) sinkx dx (3.10.2) 


where k is large, because of the consequent rapid oscillation of the integrands. 
If, with the notation of Sec. 3.6, the interval [a, b] is divided into n equal parts 
of length / so that 


.-Ρ 4 


(3.10.3) 


where n is even, and if over each double subinterval of length 2} the function 
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is approximated by the second-degree polynomial agreeing with f(x) at the three 
relevant division points, a development analogous to that which leads to 
Simpson’s rule produces formulas associated with the name of Filon. (The 
algebraic details of the derivation are uninteresting and are omitted.) 

With the previously used abbreviations 


X= Xo trh 710Χ.) Ξ αὶ ((=0,1,...,”) (3.10.4) 


where 


X=a2 x, τ 
we first define the even and odd cosine sums 


Ο, = 4fo cos kxo + ὕ cos Κχ + fg COS kxy + °°: 


+ fi, coS kx, + 4f, cos kx, (3.10.5) 
and 


C, =f, coskx, + fz coskx3 + τ +f,-1 coskx,-; (3.10.6) 


where C, involves only ordinates with even subscripts and C, only those with 
odd subscripts. It may be noted that 2hC, and 2hC, would afford the trapezoidal 
and repeated midpoint approximations, respectively, to the cosine integral 
relative to the spacing 2h. 

With the additional abbreviations 


θ0-κι-θ - ἃ 6107 
n 
and 

2 πυ- 
α(θ) = θ΄ + 40 ἘΞ 2 sin“ θ (3.10.8) 

θ(1 + cos? 6) — sin 20 
β(θ) = 2 Ὁ soos (3.10.9) 
(iene at (3.10.10) 


θ3 
Filon’s cosine formula can be written in the form 

b 
{ f(x) cos kx dx = h[a(f, sin kx, — fo sin kxo) + BC, + yC,) +E (3.10.11) 


where the error term E accordingly vanishes when f(x) is a polynomial of 
degree 2 or less. 


+ Unless k = 0, it does not vanish when f(x) is a polynomial of degree 3 despite 
occasional implications to the contrary in the literature. 
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In fact, the error associated with a typical subinterval [x2,, x21] is 
expressible in the form 


: 
= =. [f™(1) cos KE, ~ 4kf’"(E3) sin Κξ,] 


where all €’s lie in that subinterval (see Prob. 52). Since the total error E is the 
superposition of /2 such contributions, it follows that 


IE| < —_" h*(M, + ΔΚΜ.) (3.10.12) 


f"@)lS Ms If"@ISM, (@S x85) (3.10.13) 


The bound (3.10.12) reduces when k = 0 to that deducible from (3.6.5) in 
accordance with the fact that (3.10.11) must reduce to the parabolic rule when 
k = 0. This reduction can be confirmed directly by use of the expansions 


a= 2.05.1... Post ZO ee. yas 202..... (3.1014) 


when @ is small. 
In addition, Filon’s sine formula can be written in the form 


[ f(x) sin kx dx = hl —a(f, cos kx, -- fo cos Κχο) + BS, + yS,| + E 
(3.10.15) 


where S, and S, are the even and odd sine sums obtained by replacing cos by sin 
in (3.10.5) and (3.10.6), respectively, and where the error E differs somewhat from 
the corresponding term in (3.10.11) but again satisfies (3.10.12). 

Since the attainable bound (3.10.12) can be written in the form 


ΙΕ] < ΗΝ h3 (om, Ῥ 1 Ma) (3.10.16) 


it follows that [8] may be expected to tend to increase linearly with Θ when θ 
is reasonably large so that large values of k still tend to require small values of h. 
It is usually recommended that 0 = kh not exceed about 1.0 or 1.5. 

Various modifications and generalizations are possible. In particular, it 
is found that when f(x) = x°, the error in the cosine formula associated with 
the rth subinterval [x,,, x2,42] is expressible in the form h* δ(θ) sin kx,,41, 
where 


5(0) = 48 — 67) sin 0 — 30 cos 0 


τ: (3.10.17) 
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so that (9) does not vary with r. It then follows that if one defines the sine 
correction sum 


3 
δὲ = 7 [f'"(x1) sin kx, + f’'(x3) sin kx3 + °°: 


+ f"'(Xn—1) Sin kx,-,] (3.10.18) 
then the result of adding the term | 
h 6(0)S, (3.10.19) 


to the approximation (3.10.11) and subtracting an equivalent term from its 
error E will be a formula in which the new error E vanishes when f(x) is any 
polynomial of degree 3 or less. It is found to satisfy the inequality 


_ 4 
\E| < eae" + 40)M, (3.10.20) 


with the notation of (3.10.7) and (3.10.13). (See Prob. 54.) 
Similarly, with the definition 


3 
C. = ἊΝ [{" 0) cos kx, + 75 0.3) cos Κχ + °°" 
+f" (%_—1) COS kX,-1] (3.10.21) 


the addition of the term 
h δ(θ)ς, (3.10.22) 


to the approximation (3.10.15) leads to an error which also satisfies (3.10.20). 
When θ is small, it is found that 
49 20° 
δ(θ) = — —-—— +°:: (48.10.23 
@) 15 105 ( ) 
If only the first term is retained in (3.10.23), the correction (3.10.19) is approx- 
imated by the sum 


5 
= [f(x,) sin kx, + [ΠΟ Ὁ) sin Κα + +++ + f"(%q-1) sin Κα, -- 1] (3.10.24) 


and the correction (3.10.23) by a similar sum. These two corrections to the 
Filon formulas appear in the literature (see NBS [1964] and Froberg [1969 ]). 


3.11 Differentiation Formulas 


To conclude this chapter, we list a few formulas which may be used for numerical 
differentiation of tabulated functions at tabular points when the need for such a 
calculation cannot be avoided. 
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By differentiating three- and five-point lagrangian interpolation formulas 
and evaluating the results at tabular points (see Sec. 3.4), the following sym- 
metrical sets of derivative formulas may be obtained, with a convenient re- 
numbering of the ordinates. 


Three-point formulas: 
πε ἀν εὐ tre ess 
-1 2ἢῃ --1 0 1 3 . . 
f ae tt a see h? ut 
fo = ria 7.4. + 3) ς Ft’) (3.11.2) 
, ee J, pam h? wt 
f= U-1- fot ¥+=F/"O 6113 
Five-point formulas: 
fla = τ; (--250.2 Ὁ 486. — 360 + 16f, -- 3.) + τῷ (3.11.4) 
foot -a.- 14.41% -Kt+M-Mre 6115 
-1 = Top, -2 -1 0 1 2 0 ΑἹ, 
τς ἐδ ας fae a te 
f= Fa - Part hH- H+ = LO (3.11.6) 
ἂς Bh 7 as 
Ki = Tp 7.2 + Of, 18fp + 10f, + 3f,) ae (¢) (3.11.7) 


fi τ (3... -- 166. + 3600 -- 48f, + 25.) + <S'@ (3.11.8) 


In each set of formulas, each & lies between the extreme values of the abscissas 
involved in that formula. It should be noticed that the coefficient in the trunca- 
tion error is least when the derivative is calculated at the central point, and 
that the ordinate at that point is then not involved in the calculation. 

The sum of the magnitudes of the coefficients (and hence the significance 
of the effects of roundoff errors) is also least for the central point, but it increases 
rapidly with distance from the central point for a given set of formulas as well 
as with increasing order of successive sets. For reference purposes, the seven- 
and nine-point formulas for the derivative at the central tabular point are 
listed as follows: 
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Seven-point formula: 
, 1 h° vii 
fo = τη a + 9f_, — 45{-ἰ + 45f; -- 91. + fs) - ἢ (ἢ (3.11.9) 
Nine-point formula: 
ΓΞ = (3f4 — 32f_3 + 168f_» — 672f_, + 672, 


h® ix 
— 168, + 3%, - 30) +S (ὃ (3.11.10) 


An inspection of the numerical differentiation formulas reveals the 
existence of a new problem in error control. For example, consider (3.11.2) 
and suppose that it is known that 


If"COL S Mz 
on the interval [xp — ἢ, Xo + A]. Then if all given data were exact, the 
maximum possible error in the calculation of (Χο) would be 
M;h? 
6 


|E3| max - 


On the other hand, suppose that each of the ordinates involved could be in 
error by -te. Then the magnitude of the corresponding error in the calculation 
of f’(x,.) could be as large as 
εξ 

[3] max ne h 
Whereas a reduction of the truncation error E, would generally require a 
decrease in ἦι, a small value of ἡ would lead to a large possible roundoff error 
Ἐς and, conversely, a reduction in |R3|max would generally correspond to an 
increase in |E3| na,- 

A reasonable procedure consists of determining the interval / such that 
the predictable upper bounds on the two errors are about equal if this is feasible. 
The optimum value of ἢ and the corresponding maximum total error T3 are 
then found to be 


3ng—1/3 Δ: 2/3 4491/3 
hs, op: © 1.8e3Mz "3. |Tylmax © 1-167/°.M3! 


Corresponding results relevant to (3.11.6), (3.11.9), and (3.11.10) can be 
obtained as follows: 
hs opt © 2.161/°Ms 1" \Ts|max % 1.4e*/°M5/° 
hy op © 2.2: Μ: 1 \Tolmax % 1.7e°/7M3"" 
ho, opp © 2.26°°Mg* | Tolmax © 1.968/9 M3/? 


δ 


2 
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In illustration, suppose that empirical values were to be obtained for a function 
which is truly of the form f(x) = sin wx, and that one of these formulas were 
to be used to approximate /’(0) = w. In this case, the relevant quantities 
ΜῈ are each equal to w. Thus, if, say, the maximum observational error 8 
is 0.01, the optimum spacings for the three-, five-, seven-, and nine-point 
formulas are found to be about 0.39/a, 0.84/a@, 1.14/@, and 1.32/w, respectively, 
and the corresponding maximum total errors in the calculation of f’(0) = ὦ 
are found to be about 0.051, 0.035@, 0.033, and 0.032, respectively. The 
increase of h,,,, with increasing ἢ, and the fact that an increase in n affords only 
slight improvement in guaranteed accuracy, are both worthy of note. 

The results of this example are typical of most practical situations in which 
the function f(x) is representable by a Taylor series which converges for all 
values of x. When the series representations have finite radii of convergence, 
the quantities Μ21 tend to increase with increasing k, and the incorporation of 
additional ordinates may lead to a decrease in guaranteed accuracy, at an early 
stage, when the inaccuracies in the given data are appreciable. (See Probs. 
57 and 58.) 

In practice, unless f(x) 1s given analytically, the truncation error relevant 
to any lagrangian formula can be estimated only roughly by making two or 
more independent calculations, based on different sets of ordinates, or by 
determining sample values of the divided difference of order equal to the number 
of ordinates used. It is apparent that recourse to the latter alternative would 
tend to nullify the computational advantages which are inherent to the lagrangian 
methods. However, when equally spaced abscissas are used, divided differences 
of a given order can be calculated conveniently by use of simple formulas (see 
Probs. 7 and 8 of Chap. 2), without resort to the formation of a divided-difference 
table or to the calculation of intermediate differences of lower order. Equation 
(2.3.2) is available for the same purpose in the general case. 

It is useful to notice that since 

Z 4 
Lio FL =D = f(x) +E ΌΩ +E P60) το 
when f(x) is regular at x9, it follows that for any such function the formula 
(3.11.2) can be written in the form 


fiw his _ or -— cht G.AtAt 
2h 
where ὦ = f@**"))(x,)/(2k + 1)!. Thus the use of iterated Richardson 
extrapolation (Sec. 3.8) then is appropriate (if sufficiently many significant 
figures can be retained to control the effects of roundoff errors) and recursive 
calculation can be formulated in analogy to Romberg integration. 
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Specifically, if D{?? denotes the approximation to fy = f'(x9) afforded 
by (3.11.2) (with the error term omitted) for a certain choice of h, and if Di 
denotes the result obtained when ἢ has been halved k times, there follows 


D® = F&%o + In) — f%o — πὸ (3.11.12) 
2h, 
where 


h,=— (k=0,1,2,...) (3.11.13) 


If then we define 
40.511} -- pe? 


D&® = 
; 4m _ 1 


(m = 1,2,...) (3.11.14) 


in complete analogy to the definition (3.8.5), we can construct the array 


0 
De 
D'” DSP 

0 2 
DP D® De 


in which Dt” affords an approximation to 9 with a truncation error of order 
hz™*?_ In particular, it can be verified that D{” is identical with the result of 
applying (3.11.6) with ἢ replaced by /,. 

In illustration, when f(x) = —1/(x + 1) and x 9 = 0, so that fo = 1, 
an array of approximations corresponding to the choice A = 0.5 (for the initial 
spacing) is obtained as follows when each entry is rounded to eight decimal 
places as it is calculated: 


Di® D{» D{) Di» Di» 


1.33333 333 

1.06666 667 0.97777 778 

1.01587 302 0.99894 180 1.00035 273 

1.00392 157 0.99993 775 1.00000 415 0.99999 862 

1.00097 752 0.99999 617 1.00000 006 1.00000 000 1.00000 001 


The effect of roundoff is visible in the last entry. 


3.12 Supplementary References 


Extensive tables of lagrangian interpolation coefficients relevant to equally 
spaced abscissas are provided in NBS [1944], Pearson [1920a], and in other 
sources listed in the comprehensive index of tables by Fletcher, Miller, Rosen- 
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head, and Comrie [1962]. Coefficients for lagrangian polynomial differentiation 
are given by Salzer [1948]. Tables of coefficients for lagrangian exponential 
interpolation are given by Luke [1953], and tables for trigonometric interpola- 
tion (see Prob. 7) are given by Salzer [1949]. 

Tables for inverse lagrangian polynomial interpolation are given by 
Salzer [1944a, 1945]. Salzer [1951] also provides formulas of lagrangian type 
for determining the argument for which the derivative of a function takes on a 
prescribed value. 

Comprehensive treatments of numerical integration (quadrature) are 
listed in Sec. 8.17. 

Richardson extrapolation and Romberg integration are considered in 
Henrici [1964] and Ralston [1965]. Convergence of a Romberg sequence is 
dealt with by Bauer, Rutishauser, and Stiefel [1963], and various modifications 
of the Romberg process are summarized in Davis and Rabinowitz [1967]. 

The last reference also includes a consideration of additional iterative 
schemes and of other “general-purpose” algorithms which are intended to 
permit computer evaluation of an integral within a prescribed tolerance without 
the need for any analysis on the part of the user. References are provided, 
together with warnings of dangers inherent in blind reliance on such automatic 
integrators. 

A very complete listing of formulas for numerical integration, with various 
weighting functions, in terms of linear combinations of ordinates at equally 
spaced points, is given by Miller [1960a]. 

Tables of the coefficients in the formulas of Filon [1928] may be found in 
the NBS [1964] handbook and in other sources listed in the index by Fletcher 
et al. [1962]. Generalizations of Filon’s method include one obtained by 
Flinn [1960] by approximating f(x) on each 2A interval by a polynomial of 
degree 5 such that there is agreement with both f(x) and f’(x) at the division 
points. This and other related formulas appear in Krylov and Kruglikova 
[1969], together with treatments of the case b = 00 and with associated numer- 
ical tables. 

Numerical integration over regions in two or more dimensions (see Probs. 
40 and 41) is treated extensively in Stroud [1971]. See also Irwin [1923], 
Radon [1948], Willers [1950], Steffensen [1950], Hammer [1959], Miller 
[19605], Stroud [1967], Davis and Rabinowitz [1967], and Haber [1970]. 

The fact that the usefulness of the simple midpoint formula for one- 
dimensional integration was long overlooked in the literature and also that it 
generalizes naturally to a centroid method, which is useful for integration in 
higher dimensions, is discussed by Good and Gaskins [1971]. See also 
Hammer [1958]. 
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PROBLEMS 
Section 3.2 


1 By noticing that the zeroth lagrangian coefficient function of degree n takes on 
the value unity when x = x, and the value zero when x = x;,...,X,, and by 
considering the associated divided-difference table (or otherwise), show that 


I(x) = 1+ x — Xo (x — Xo)(X — χα) 

Xo — Xy (Xo — X1)(X%o — 2) 
(x — Χο)" (ἃ — Χρ--κ) 
(Xo — χα). (το — Xn) 


+ 


and that similar expansions can be written down by symmetry for the other co- 
efficient functions. 

2 Derive the lagrangian interpolation formula directly from the newtonian divided- 
difference formula. 

3 If (x) is the polynomial of degree n which agrees with f(x) at the distinct points 
X = Xo, Χ15...9 Χρ» and if n(x) = ( — χρ)α — X,)...(« — x,), obtain the 
lagrangian form of y(x) by determining the coefficients in the partial fraction 
expansion of the ratio 


W(x) _ τ αι 
m(x) bear 


(Multiply both members by x — x, and let x — x,.) 


4 Show that 
1 a, a? 
1 a, a3) = (@ — a1)(@3 — αι)ί(α3 -- a2) 
1 a, a3 


and use this fact to express the result of expanding the left-hand member of (3.2.3) 
with respect to the elements of the first column, and equating the result to zero, 
in lagrangian form when ἢ = 2. 

5 Generalize the result of Prob. 4 to show that 


1 a, a? a 

1 a, a3 ay 
2 n— 

1 a3 a3 a3 

1 a, az ... art 


= (a, — a,)(a3 — αι)(α3 -- Qy)(a4 — αι)ίαᾳ — A2)(Q4 — 43}""" (Gq — Gn—1) 


and to derive the lagrangian form of the interpolation polynomial from (3.2.3) in 
the general case. (The determinant involved here is often called Vandermonde’s 
determinant.) 
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By considering the limit of the three-point lagrangian interpolation formula 
relative to Xo, Xo + ὃ, and x,, as ε > 0, obtain the formula 


je) Se fg) DG) 


(x; — Xo)? x1 — Xo 
(x -- Xo)? 
Ἐ5 - Ὁ f(x4) + E(x) 
(x1 — Xo) 
where 


E(x) = Ux ~ xo)’ — xf") 


Write down a determinantal equation analogous to (3.2.3) but corresponding to 
the requirement that 
Wx) = Ay + A, coSx + A, Sin x 


agree with f(x) when x = xo, x,, and x». Then establish the identity 
1 cosa, sina, 


1 cosa, sin a,|= 4 sin }(a, — a,) sin a3 — a,) sin 3(a3 — ap) 
1 cosa; sina; 
and use this result to express y(x) in the following form (due to Gauss): 
sin 4(x — x,) sin d(x — x) 
W(x) = ea a aly (Xo) 
sin ἐ(χρ — x1) sin $(% — x2) 
sin 4(x — x9) sin 3(x — x2) On 
sin 3(x; — Xo) sin $(x, — x2) 
sin (x — Xo) sin (x — χα) 
sin 3(%2 -- Xo) sin (x2 — x1) 


(x2) 


Show also that the formula resulting from deleting the 4’s in the arguments of the 
sines (and due to Hermite) defines the approximation 


y = 40 + A, cos 2x + A, sin 2x 


which agrees with f(x) at the same three points. Also predict the form of the 
generalization of the Gauss formula to the case when an approximation of the 
form 


y = Ao + Ay CoSX + AQ SiNxX ἘΠῚ + Ag, CoS nx + AD, Sin Nx 


is to agree with f(x) at the 2n + 1 points xo, x;,..., X2,, and verify the correct- 
ness of this conjecture. 


Section 3.3 


8 Prove that (3.3.20) is valid when x is outside the span R of the values δ ον ey 


by writing F(x) = f(x) -- y(x) -- Knx(x), showing that the function 
FOR) = FOX) — yO) -- Kn (x) 
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10 


vanishes at least n — r + 1 times inside R, showing that all zeros of x(x) lie 
inside R and hence that K can be so chosen that F(x) also vanishes when 
x = xX if X is not inside R, so that 

E(x) ἘΞ Κπ (Δ) 


and proving that, with this K, there follows 0 = κα 0) — (n + 1)!K for some 
ἢ between the smallest and the largest of xo,..., X,, and X. 
Obtain the lagrangian two-point first-derivative formula in the form 


f'(x) = 1 [flo + h) -- f(%o)] + ΕΟ) 


where 


E'(x) = © (Ge — xox — xf [xo xp x7} 
ax 


with h = x; — Xo, and hence also [by (3.3.17) ] 


Ε΄) = (x - 3: τ Ξὴ F'E1) + (xX -- χρ)ὰα -- χορ Ὁ} 


where €, and €, are in the interval spanned by x9, Χι; and x. Deduce the special 
cases 


r(® ἝἜ τ = S(X1) = (ἡ) ΣῈ FO) 
2 h 
f'(xq) = LE) = 359). f png) 
h 2 
fx) = Pei) — To) τ 


where each € is in (Xo, X}). 
By writing the error term in Prob. 9 in the form 


E(x) = < [α — xf be, x] — δἴχο, χα 73] 
show that 
E(x =(«- x,)f (x1, x, x] +(x - Xo)f [xo, X15 x] 


= 6 = xp LED 4 αι - xy 


and hence that 


M 
ΙΕ΄ ΑἹ Ss > (xs — x| + |x — χρῇ) 


if |f’(x)| S Mz in the interval spanned by xo, x;, and x. In particular, in the 
case when Xo S x S χ,, deduce that 


ΕΓ} = Mah 


Il 


12 


13 
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whereas the result of Prob. 9 [or of (3.3.17) ] gives 


MM h? 
Ε' 3 
ΙΕ΄ ΟἹ Ss yr 
if also |f”(x)| S M3 when xp S x S x, in that case. [Similar manipulations 


are possible in less simple cases ἦν the purpose of obtaining derivative-formula 
error bounds, which involve only bounds on lower-order derivatives of f(x), in 
contrast with (3.3.15). In the present case the special midpoint error bound in 
Prob. 9 is sacrificed, but a better “global” bound on [xo, x, ] is in fact obtained; 
in most cases, the added simplicity is accompanied by increased conservatism. | 
Obtain the lagrangian three-point first-derivative formula, in the case when the 
abscissas are equally spaced, at spacing ἢ, and the origin is taken at the central 
point, in the form 


f(x) = — > f(-A) - ΤΣ ΛΟ) ie fh) + E(x) 


with 
Ε΄ ΟἹ = 6(3x? — hfs) + dex(x? — h?)f™(E,) 


where ¢, and €, are in (—A, A) if x is in that interval. Show also that, unless 
ΓΝ is large in magnitude relative to f”, the absolute error is least, on the average, 
at distances of about 0.6/ from the central point. 

By integrating the lagrangian three-point formula, when the abscissas are at equal 
spacing ἢ, with the origin taken at the central point, obtain the formula 


= 5h? — 3hx? + 2x3 2h? + 3h? 
Ϊ Ft) de = EE p(y 4 50 Ὁ 585 - 5 0) 
—h 
3 2. 5.3 
- - 5 + τῷ 


and show that the truncation error is expressible in the form 
T(x) = | t(t* — h*)f[-h, 0, ἃ, t| dt 
—h 
If the upper limit of the integration in Prob. 12 does not lie outside the interval] 


[—h, 0], show that 
T(x) = οκια -- h?)*f( 


where —h < & « ἢ. In particular, deduce the formula 
Xo +h h h* 

[ Hx) de = τ [5f0) + 8fl> + Δ) -- Λαο + 21}] + © 9") 
Xo 


where x9 < € < Xo + 2h, after a change in notation. 
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By integrating the expression for T(x) in Prob. 12 by parts, and noticing that 
x(x? — h*) = 4[(x? — h?)?]’, show that 
T(x) = d(x? "πὰ μ2)}[-- ἢ, 0, h, x] " al (t? " μ2)[-- ἢ, 0, h, ΐ, 1 dt 
—h 
and deduce that 
T(x) = sax? — h?)?fE,) — rdeo(3x5 — 10h?x3 + 15h*x + 85) (2) 


where €, and €, lie between the smallest and largest of —h, h, and x. In particular, 
deduce the formula of Simpson’s rule (see Sec. 3.5): 


Xot2h h hd 
[ fix) ἀκ = © [λα + Af Ho Ὁ Δ) Ὁ Λαο + 2H] -- FIMO 


0 


where x9 < € < Xq + 2h. 
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15 


16 


17 


18 


19 


20 


21 


22 


Determine the lagrangian coefficient functions, in explicit polynomial form, 
relative to the ordinates of f(x) at the four points x = -- 2, —1,1,and 2. Use the 
results to obtain approximate expressions for (0), /’(0), and [ 2 , f(x) dx in terms 
of those ordinates. 

Use the results of Prob. 15 to determine the equation of the third-degree poly- 
nomial passing through the points (-- 2, -- 5), (—1, -- 1), (1, 1), and @, 11). 
Use the Lagrange interpolation formula to calculate approximate values of f(x) 
when x = 1.1300, 1.1500, 1.1700, and 1.1900 from the following rounded data: 


x 1.1275 1.1503 1.1735 1.1972 


f(x) 0.11971 0.13957 0.15931 0.17902 


Use the results of Prob. 17 and the coefficients of Table 3.1 to determine approx- 
imate values of f(x) for x = 1.1600(0.0010)1.1700. 

Under the assumption that the data in Prob. 17 correspond to the function 
f(x) = sin (log x), obtain bounds on the truncation errors associated with the 
values calculated in Probs. 17 and 18. 

Obtain bounds on the roundoff errors associated with the values calculated in 
Probs. 17 and 18. 

Use the table of five-point lagrangian coefficients given in Sec. 4.12, to interpolate 
in that table itself for the coefficients relative to s = 0.38, 0.05, and 1.93, rounding 
the results to six places. If no roundoffs were effected, what errors would be 
present in the calculated coefficients? 

Show that, if 43|f”(x)| does not exceed 16 units in the last place to be retained 
in a three-point Lagrange interpolation based on equally spaced abscissas with 
spacing A, then the truncation error cannot exceed one unit in that place. 


23 
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Show that, if 4>|f%(x)| does not exceed 32 units in the last place to be retained 
in a five-point Lagrange interpolation based on equally spaced abscissas with spac- 
ing A, then the truncation error cannot exceed 1 unit in that place, and also that 
h>| f*(x)| may be as large as 84 units if the interpolation is effected only between 
the second and fourth of the five successive abscissas. 


Section 3.5 


24 


25 


26 


27 


28 


Prove directly, from Eq. (3.5.6), that a Newton-Cotes formula of closed type, 
employing ἡ + 1 ordinates, is exact when applied to any polynomial of degree 
n+ 1 whenz + 1 is odd. (Notice that F[0, 1,..., 7, 5] is then constant, write 
s = t + (n/2), and show that the resultant integrand is an odd function of 1.) 
Show that the factor s — (m/2) can be replaced by s — c in (3.5.95), where c is 
any constant (see Prob. 24). 

Derive the formulas resulting from neglect of the error terms in (3.5.19) and 
(3.5.20). 

Show that the truncation error associated with a Newton-Cotes formula of 
closed type employing n + 1 ordinates can be expressed in the form 


| ie | bi [-- 1)---(s — n)f[xo9,..., Χρ» Χο + hs] ds 
0 


whereas that associated with a formula of open type employing n — 1 ordinates 
is given by 


E= “ (s— 1)---@G@—n + 1)}χ4...--Ὁ Xp-1, Χο + As] ds 
0 


If f(ox+1)/2 denotes the value of f(x) at an abscissa midway between x, and 
Xn41 = X~ + h, derive the formulas 


[ ” f(x) de = Why + fan) + Es 


*3 3h 
Ϊ f(x) dx = 3 Cir + 2f3j2 + 375,2) + E> 
Xo 


{These formulas, together with the midpoint formula (3.5.23), are the first members 
of a set due to Maclaurin. It can be shown that E, = h?f’(&)/12 (xo < ἔ < x) 
and that E, = 2115 Ὁ [640 (xp < € < x3).] 
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29 


A Riemann sum associated with an integral f° F(x) dx is an approximation of the 
form 


SS Fas. a) 
k=0 
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30 


31 


where 
€4=S S855, 58455 Ξ ΞΘ 5 ty S 5y41 = ὃ 


Any sequence of such sums in which the subdivision of [a, δ] is refined in such 

a way that max (s,,, — δι) > 0 tends to the (Riemann) integral J if it exists. 

(a) Show that the approximations afforded by the repeated midpoint rule, the 
trapezoidal rule, and the parabolic rule are Riemann sums. (Display the values 
Of 51, S2,..., 5, in each case.) 

(δ) The relation 


b 
[ feo a = 2 +ththt+1043+A+4+ 10% +--- 


+ 10f,-3 + fr-2 + -- + Shr) 


with the notation of Sec. 3.6 is an equality when f(x) is any linear function. 

Prove that the approximation is not a Riemann sum. 
Convergence of composite rules Suppose that [a, δ] is divided into r equal parts 
by a= Χὸ « Α΄, < +++ < Χο. < X, = δ, and let (-—a@/r=H. If an 
m-point formula which yields exact results when integrating a constant is used to 
approximate the integral of f(x) over each subinterval [X;, X;.,], prove that the 
sum converges to the integral over [a, b] as the spacing H — 0. (if the result of 
applying the m-point formula to [Xo, X,] is of the form 


x m—1 
‘f(x) dex HY mf(Xo ta) ΟΞ Ξ H) 
Xo k=0 


show that the total approximation is given by 


b m—1 r=1 
| f(x)dx x > wy (x > SX, + 9) 
ὰ k=0 i=0 
and that the inner sum is a Riemann sum for f(x) over [a, δ]. Then let r > 00 
and complete the proof. See also Davis and Rabinowitz [1967], Sec. 2.4. 
Show that the composite rule corresponding to the repeated use of Newton’s 
three-eighths rule (3.5.12) is of the form 


[7 dx = = (fo δ 4 3h, Ha ee 
᾿ 5 
ΠΣ τε γε: τῷ 


where n is to be an integral multiple of 3. Also, by considering the case when n 
is a multiple of 6, so that both this rule and the parabolic rule can be used with the 
same spacing ἅν, account for the fact that the parabolic rule is nearly always pre- 
ferred. 
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Section 3.7 


32 


33 


34 


35 


36 


37 


38 


39 


40 


Given the following rounded values of the function 


f(x) = 9 ens? 
π 


> pl 
P(1) = 9 Ϊ e~ 7/2 dt = 0.6826895 
Zz Jo 


calculate approximate values of the integral 


by use of the trapezoidal rule with ὦ = 1, 4, 1, and 4, and compare the results 
with the rounded true values: 


x f(x) x f(x) 
0.000 0.7978846 0.625 0.6563219 
0.125 0.7916754 0.750 0.6022749 
0.250 0.7733362 0.875 0.5441100 
0.375 0.7437102 1.000 0.4839414 
0.500 0.7041307 


Repeat the calculations of Prob. 32 using instead the repeated midpoint rule 
with A = 1, 4, and 3. 

Repeat the calculations of Prob. 32 using instead the parabolic rule with 
h = 4, 4, and 1. 

Repeat the calculations of Prob. 32 using instead the Newton-Cotes three-, five-, 
and nine-point formulas of closed type. 

Calculate approximate values of the integral 


1 
{ eS MX dx = 70(4) = 1.2660659 
0 


by use of the trapezoidal rule with ἢ = 1, 4, 3, and 4, retaining seven decimal 
places, and compare the results with the rounded true value. 

Repeat the calculations of Prob. 36 using instead the repeated midpoint rule with 
h = 1, 4, and 3. 

Repeat the calculations of Prob. 36 using instead the parabolic rule with 
h = 3, 4, and 3. 

Repeat the calculations of Prob. 36 using instead the Newton-Cotes three-, five-, 
and nine-point formulas of closed type. 

By a double application of Simpson’s rule, derive the formula 


X2 [¥2 
Ϊ { I(x, y) dx dy = = [foo + fo,2 + foo + fo,2) 
xo J¥o 


+ 4(9}0,.5 + fio + fi,2 + 2...) + 164.5] + E 
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4] 


42 


where xX, = Xo + rh, γε = Yo + sk, and f,, = f(%,,y¥s), and show that 


ἔξ G HF Erm) , 4 δ' [(ξ2, 1} 
45 ox* oy* 


where €,, €> lie in (xo, x2) and 71, 72 iN (Yo, ¥2). [More elaborate formulas for 
two-way integration over a rectangle (‘“‘cubature formulas’) are obtainable by 
double application of other one-dimensional integration formulas. | 

By applying the formula of Prob. 40 to subrectangles and adding the results, 
derive the two-dimensional generalization of the parabolic rule in the form 


Xm 


Yn 
| Ϊ f(x, y) dx dy -= eee Ve a) ee ee a 
Xo Yo 


+ 4(fo,1 + 4fi41 + 2fo1 + τ: + Sn) 
+ 2(fo,2 + 4f1,2 + 2fo.2 + τ + Sn) τ. 
Ἔ (fo,n 2 Afton + 2,2." ΤῊΣ a fan)! +E 


where : 
ε--. kant LEM) ps PFE 2» Tr) 
90 ax* oy* 


for some points (€,, 74) and (€5, 72) inside the rectangle, when m and n are even 
integers. : 
Obtain an approximate evaluation of the integral 


1 cos x 


ο vx 


tdx _ [*1—cosx 
ovx Jo vx 
evaluating the first integral analytically, and applying the parabolic rule with 
h = 4 to the second one. 
(b) By making the change of variables x = ¢? in the original form and applying 
the parabolic rule with h = 4 directly to the result. 
Also compare the approximations with a more accurate value obtained by expand- 
ing the integrand of one of the forms in a power series and integrating term by 
term. 


xX 


(a) By writing it in the form 


dx 
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43 


45 


Prove that in Romberg integration T{) is the result of applying the parabolic 
rule 2* times and T(?) is the result of applying the Newton-Cotes closed five-point 
formula 2" times. 

Apply Romberg integration to the data of Prob. 32. 

Apply Romberg integration to Prob. 36 using a total of nine ordinates. 
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Section 3.9 
46 Show that 


2m+1 
Ϊ s(s -- 1)---(s -- 2m — 1) ds 


0 5 ᾿ 
= s(s — 1)---(@ — 2m — 2) ds 
2m + 3 Jo 


when m is a nonnegative integer. (Express the left-hand integral as a sum of 
integrals between successive integers, translate all lower limits to zero, and show 
that the 2m + 1 terms in the resultant integrand can be telescoped into the sum 
of two terms. Then replace s by 1 — sin the integrand of one of those terms.) 

47 Show that the numerical factors in (3.9.1) and (3.9.2) can be expressed in the 
form 


ps 2m s(s — 1)---(s — 2m)\(s — 2m -- 1) 4, 
τὰ [ (2m + 2)! 


when 7 = 2m, and in the form 


I - ome s(s -- 1). 0 -- 2m)(s -- 2m -- 1) ας 
Ν᾿ (2m + 2)! 
when n = 2m + 1, and show also that 
1 s(s —1)---(s —-2m—1 
lames ~ tam = [ s(s -- 1)-+-(s — 2m — 1) ὰ 


ὃ (2m + 2)! 
48 With the abbreviation 
1 — eee es 
a δ(δ — 1)---(8 —kK + 1) ὦ 
ο k! 
show that the results of Probs. 46 and 47 lead to the relations 


Lom = — 2h om43 —~ Gom+2 
Iom+1 = — 2Gom+3 


and deduce that the error associated with a Newton-Cotes formula of closed type, 
employing + 1 ordinates, can be expressed in the form 


E.= βῆς cae (n odd) 
"(= (Qoans3 + Ong 2)h"t3¢"2) (πη even) 


49 The constant «, defined in Prob. 48 is expressible as a generalized Bernoulli number 
and is often denoted either by B&(1)/k! or by BO/k! + B&SYV(K — 1)!. It is 
known (see Steffensen [1950]) that 


(-- 1 


Kicsey? π΄) 


AK 
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Assuming this fact, show that the numerical factor in the expression for E,, is 
approximated by — 2/[n(log n)?] when n is a large odd integer, and is approx- 
imated by —1/[n(log n)*] when n is a large even integer. 

50 Suppose that f(z) is an analytic function of the complex variable z in a simply- 
connected region 93 of the complex plane which includes the segment a S$ x S ὃ 
of the real axis. By making use of the fact that then 


fC ) 3 128 f(z) 
. 2πὶ 


ς( -- 29} 3 
for any Ζρο in &, where C is any simple closed curve in 923 enclosing Zo, deduce that 


F()| Ξ a (asx b) 


where |f(z)| S Mon C, L is the length of C, and d is the shortest distance from 
a point x in [a, b] to a point z on C. [Notice that C can be any simple closed 
curve, enclosing the segment a Ξ x S ὁ (once) but excluding all singular points 


of f(z). | 


Section 3.10 
5I Show that 


[ ΣΝ χὰς: S αι ΚἈΚ) + E 
2 k=—1 


where 


1 1,(Χ) 
C= Ξ dx 
: Lz 


with L,(x) the Lagrange coefficient function defined by (3.4.5) and (3.4.6) when 
n = 2, and where 


ε- [ x(x” — 1)f[—1, 0, 1, x] dx 
FS 


Ϊ ᾿ Ee Ba -- “yn FL-1, 0, 1, x] ἀκ 
τ᾿ 


ze FO Γ᾿ dad — x7)° dx 


Hence deduce the formula 


᾿ f(x) _ ery | _ # piv 
Ι. ED, ἃ = FUCD + 2/0) + FO] τ IO 


where [é| < 1. 
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52 Show that when the Filon cosine formula is applied to the rth subinterval 
[χΧ2,. X2p4.2] Of length 2h, the associated error is given by 


X2r4+2 
[ (X -- χρὴ) -- Xor41)(X — χα 2) Χ2ν» X2r41> χε. 2» X] cos kx dx 
Χ2ν 


Then show that 
τς: d 1 2 2 
(xX — χρὴ) -- X2-41)(X — X2742) = dx [A(x — X2,) (X2-42 — X)"] 
and use integration by parts to transform the error expression to the form 


5 
= = ΠΛ ζω) cos ἀξ — 4kf"(Es) sin kE,] 


where each € is in [x,, X2,+2]. 
53 Use the Filon cosine formula to approximate the integral 


** 31% ρος kx dx = -- m1 — 6 cos kx) 
ἅ 1- κξπ2 


for k = 1 and k = 4. In each case determine the two approximations corre- 
sponding to the spacings ἢ = 7/4 and 7/8, retaining five decimal places. 

54 Suppose that the Filon cosine formula is applied to f(x) = x? on the sub- 
interval [x>,, X2,42]. 
(a) Show that the associated error can be expressed in the form 


X2r+2 
ἢ (X -- X2,)(X -- χα υ)ὰ -- Xor42) COS kx dx 
X2r 


[See (2.3.11). ] 

(6) Assuming that the error considered in part (a) also can be expressed in the 
form h* 6(6) sin kx2,4,, with the notation of (3.10.17), deduce that if the 
term 


it Carty) 5(0) sin kxor44 
is added to the approximation in (3.10.11), then the term 


X2r+2 m x 
Ϊ (X = χρᾷ — Xop41)\(X — Χαρα py Sar cos kx dx 
X2r 
must be subtracted from the integral considered in Prob. 52. 
(c) Use this result to deduce that when ἡ 6(0)S, is added to the approximation 
in (3.10.11), with the notation of (3.10.18), the resultant modified error E then 
satisfies (3.10.20). 


55 Apply the correction (3.10.19) to each of the calculated approximations in Prob. 
53. 
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Section 3.11 


56 From the following rounded values of f(x) = (1 + x)~?, determine approximate 
values of f’(x) for x = 1.0, 1.1, and 1.2 by use of appropriate three- and five-point 
formulas, estimate the errors, and check the validity of the estimations: 


x 1.0 1.1 1.2 1.3 1.4 


f(x) 0.2500 0.2268 0.2066 0.1890 0.1736 


57 Determine the optimum values of the spacing / for the purpose of approximating 
(Ὁ when f(x) = 1/(x + 2) by means of the three-, five-, and seven-point 
differentiation formulas considered in Sec. 3.11 with xy = 0 when the predictable 
bound on the magnitude of the total error T = E + R is to be minimized, under 
the assumption that each ordinate may be in error by +0.01. Also calculate the 
corresponding bounds on |7| and show that |T7}max > |Tslmax: 

58 Values of a function f(x) are to be determined for x = 0 and for four additional 
positive values of x, and are to be used for the approximate determination of 
f’(0). Assume that use is to be made of (3.11.4) and that the accuracy of the 
calculated values can be guaranteed within only 1 percent, and suppose that the 
true function is f(x) = 1/(1 + x); determine the spacing for which the sum of 
the squares of predictable upper bounds on the truncation and roundoff errors is 
least, and calculate the corresponding upper bound on the total error. Also 
compare this situation with that in which only three ordinates are to be used. 


4 


FINITE-DIFFERENCE INTERPOLATION 


4.1 Introduction 


This chapter returns to the consideration of formulas expressed in terms of 
differences, rather than of the ordinates themselves, but deals only with the 
cases in which the abscissas are equally spaced. Here the rather cumbersome 
notation of divided differences is not needed and is replaced by other notations 
which are explained in Sec. 4.2. 

The most important of the interpolation formulas which involve differences, 
together with error terms, are derived in Secs. 4.3 to 4.7, and their respective 
uses in connection with desk calculation or with the use of a digital computer 
are discussed and illustrated in Sec. 4.8. It is of some historical interest to note 
that the formulas bearing the names of Gauss, Stirling, and Bessel apparently 
were first known to Newton, while the formulas attributed to Newton (Sec. 4.3) 
are due to Gregory. Further, Everett’s first formula is due to Laplace, and 
Everett’s second formula apparently was first given by Steffensen. 

The propagation and detection of errors in given data are considered in 
Sec. 4.9, whereas a useful method of taking certain higher differences into 
approximate account, by modifying certain earlier differences, is illustrated in 
Sec. 4.10. The concluding section of the chapter provides some information 
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concerning the behavior of the error term in certain interpolation formulas, 
as more and more differences are retained, and indicates the practical significance 
of that information. 


4.2 Difference Notations 


When data are tabulated for uniformly spaced abscissas, with spacing A, it often 
is convenient to express formulations for interpolation and related processes 
in terms of the differences themselves, rather than the divided differences used 
in Chap. 2. 
For calculation near a tabular point x, at the beginning of the tabulated 
range, it is conventional to define the forward difference Af (xo) as 
Af (Xo) = £(%o + ἢ) — ὁ) (4.2.1) 


If also Af(xo + ἢ) = f(xo + 2h) — (Χο + A) is known, then the second 
forward difference associated with xo is defined as 


A*f (Xo) = Af(%o + A) — Af (xo) 
= "(Χο + 2h) — flxo + A+ F(%o) (4.2.2) 


and succeeding forward differences are defined by iteration. More generally, 
we introduce the definitions 


ΔΙ. =f(x - ἢ —f) ATTY) = δα +A) -- Δ) 42.3) 
the spacing ἢ being implied in A. If a more specific notation is needed, A, 
may be used in place of A. 

When forward differences are used, it is convenient to number the abscissas 


Xo, X;,--- in increasing algebraic order, so that 
Xe+41 = Xx + h (4.2.4) 


Then, with the notation of Sec. 2.3, there follows 
Δία = fOne0 — SO) = κει — WALD χε] 
= hf [Xp Xn+1] 
A?f(x,) = hf LXn+1. χκε 2] — hf (Xt X41] 
= με) — XSL %e Χκει» χα] = hf [Xs Xer1> %e+2] 
and, in general, induction shows that 


A'f(x,) = (r — I)! Wf [Xtrts +++ Xen] 
_ (r " 1)! Wf Xe “Ὁ. Χκεργοι 
(r "" 1)! "ἢ Ἰκεν a XSF Xt. “529 Xprr 
=P! A(X Χκναν». τ.» Xeor (4.2.5) 
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The beginning of the corresponding difference table is indicated in Fig. 
4.1, where /, is written for f(x,). We notice that the subscript remains constant 


FIGURE 4.1 


along each forward diagonal of the table, and that the region of determination 
of A’/, is bounded by the kth forward diagonal and the (r + k)th backward 
diagonal. Hence the difference A’f, depends upon the ordinates Tes Ti hissens 
κι,» a8 15 also indicated by (4.2.5). : 

For calculation near the end of a tabulated range, the notation of backward 
differences is often more convenient. Here we write 


WIC) =f) -f&e-h VOY - νγοὺὴ -- γα -- ἢ (4.2.6) 
If the abscissas are again numbered in accordance with (4.2.4), there follows 
χορ = SO) -- I (%-1) = (% - χορ DS LX, Xn-1] 
= hf (x, X,-1] 
and, in general 
νας = rfl, So ee eee (4.2.7) 


in analogy with (4.2.5). 
The end of the corresponding difference table is indicated in Fig. 4.2. 


XN-3 In 


ΧΝ.. νω--- ee INE I guerre 


XN—-1 »νι-- “Ὶ In 
FIGURE 4.2 XN 


Here the subscript remains~ constant along each backward diagonal. Also, 
it is seen that the difference V'f, depends upon the ordinates f,_,, fe-piis ++ +s Tics 
as 1s also indicated by (4.2.7). 

For the remaining calculation, the notation of so-called central differences 
is usually most convenient. If the calculation is to be effected near a certain 
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interior tabular point, it is convenient to number that abscissa as xo, and to 
number forward abscissas as x,, X2,... and backward abscissas as x_ 4, X_2,.--, 
so that (4.2.4) again holds. In the central-difference notation, one writes 


f(x) = γα + 4h) — ἔα — Hh) 
StF (x) = OF (x + Zh) — δα — 3h) 


It is seen that df, = 6f(x,) generally does not involve tabulated ordinates. 
However, the second central difference 


Of, = Of(%, + th) — f(%, — 3h) 
= [f(, + A) - f(x,)] - Lf) -- ἀκ - h)| 
= fist — 2h + Se-s 


does involve tabular entries, and the same is seen to be true of all central dif- 
ferences 6?"f, of even order. Furthermore, we may notice that 


Ofc + (4/2) = ει — hr 


and, more generally, that 67"*'f,.. (1/2) involves only tabulated arguments. 
With the notation of Sec. 2.3, we may write, for example, 


δῇ, 2 - Κι —fo = 7|χο. χα] Of-1j2 =fo — f-1 = hf [Xo, X-1] 

Pf, = Ff -- Air = hf (x1, X2] — hf xo, *1] = 2! h*f Xo, 1, X2] 

and, in general, 

Set ay = WPM (Mm + OYA Dem + > Meo τ τ Xk ere (4.2.9) 
67m A -(1j2) = h?™* 102m + YS [Xp mais Xho s Meee Xpim| (4.2.10) 


and 


(4.2.8) 


ae = h?™(2m)! f [{Χκ-- wey Xerces Keil (4.2.11) 
The portion of the corresponding difference table in the neighborhood of an 
interior tabular point x9, near which calculations are to be made, is indicated 


FIGURE 4.3 
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in Fig. 4.3. Here the subscript remains constant along horizontal lines of the 
table, which pass through differences of only even or only odd orders. 

Thus, once a set of adjacent entries in a difference table has been num- 
bered, three different sets of notations are available for the differences them- 
selves, as may be seen from the composite Fig. 4.4. Any one of these sets of 
notations would suffice. However, each possesses certain advantages in certain 
applications, as will be seen. 


Afo = δῇ, 2 = Vii : 
Xj hi A*fo = δῇ, = Vf 
Af, = fa. = Via 


FIGURE 44 : 


4.3 Newton Forward- and Backward-difference Formulas 


In order to obtain an interpolation formula such that the retention of n + 1 
terms leads to the polynomial of degree n taking on the values of 70.) at Xo, 
Χι = Xo th,..., X_ = Xo + nh, we may refer to Newton’s divided-difference 
formula (2.5.2), making use of the relation 


F | Χῥεξοὺς 
which follows from (4.2.5), to obtain the result 


7 ANfo (4.3.1) 


f(x) =fot+ (x - Oy τὴ + (xX -- χρ)α -- x) Sf Ἔ 11: 


+ (ὦ -- XoMX -- χι) "(ὦ -- χ,-4) 7 + E(x) (4.3.2) 
where 


(n+1) é) 
E(x) = (x — χορ) (ἃ — x, 7 ( 4.3.3 
πὰ. 43.3) 
and where ¢ is in the interval spanned by Xo,..., X,, and x, in accordance 
with (2.5.3) and (2.6.8). 
The formula takes on a simpler form if we introduce a dimensionless 
variable s defined as distance from x, in units of h: 
X — Xo 


s=-— SB χε xo ths (4.34) 
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Since then there follows also x — x, = h(s — k), the preceding formula takes 
the form 


ts = fo + 5 Afo + ἘΞ Wh ope see 
Ὄ δ τη 6 58 Δ + E, (4.3.5) 
where 
= ΝΠ ΩΝ es eee ὌΝ (n+1) 
ee ee τυ 


and where we have written 
f, =f(%o + As) = f(x) E, = E(xo + hs) = E(x) 


This formula (or, more properly, the result of neglecting the error term E,) 
is known as Newton’s forward-difference formula for interpolation. It makes use 
of the difference path indicated in Fig. 4.5. 


Xo fo , 
Afo 
| χι fi bier! oan 
Af; 
x2 fr 
FIGURE 4.5 oc 


In a similar way, if we require a formula successively introducing the 
ordinates at xy, Xy-1, Xv—2, and so forth, we may replace x» by xy, Xx, by 
Xy—is ++ +> X, by Xy_, in (2.5.2): 


F(x) = f(xy) + ὦ — xy) fm Xn-1] 
+ (x — χη) — Xy-1)fLXN> Χν-- 1» χε} ἘΠ: 
+ (X -- χν)α — χν- 1) -- Xy-n+ ι)7[χν. Χν--τ»..-» Xy-n] 
+ E(x) 

and, writing here 


=~ *N ysaxyths (4.3.7) 


5 
we may use (4.2.7) to reduce this result to the form 


fuse = fu + 8Viy + SSO γὴν to 


1 = V'fy +E, (4.38 
n! 
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where 
n+1 


s = (n+ 1! s(s + 1) οἱ (s + n) τ YE) (4.3.9) 


This formula is known as Newton’s backward-difference formula, when E, 15 
neglected, and it utilizes the difference path indicated in Fig. 4.6. 


Xy-2 ἦν: 
Vin-1 
XN-1 in- 1 νγι,.---- Vf; 
ane 
Ἷ XN In 
5 


FIGURE 4.6 


If r+ 1 terms are retained in (4.3.5), the polynomial agreeing with 
J (x) at Xo, %4,...,%, is obtained; the retention of r + 1 terms in (4.3.8) 
yields the polynomial agreeing with I(x) at xy, Xy-4,...,Xy_, If N+ 1 
terms were retained in each formula, the two formulas would involve the same 
ordinates and would yield the same polynomial approximation. 

More generally, the former would be used near the beginning of a tabula- 
tion (at which only forward differences are available) and the latter would be 
used near the end (where only backward differences are available). In particular, 
the backward-difference formula is especially useful in extending a tabulation, 
and for generating other formulas useful for advancing numerical solutions of 
differential equations. For this reason, s was measured forward in the table in 
both formulas, so that it is positive for extrapolation in (4.3.8), whereas it is 
positive for interpolation in (4.3.5). Either formula can, of course, be used for 
either interpolation or extrapolation. 

The formulas can be written in more concise form in terms of the binomial 
coefficients 


({) ΤΡ -- 1) τ πι|ικ Ἑ 1) (4.3.10) 
k k! 


With this notation, the forward-difference formula becomes merely 


te = f(% + hs) & > (;) Δ (4.3.11) 


k=0 
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Further, the coefficient of V‘fy in (4.3.8) is seen to be 
. . 


k k! 
yp es Ds Ὁ 
k! 
= (-1) ἢ (4.3.12) 


so that the backward-difference formula takes the form 


Ines = f(xy + hs) ~ Ds (—1) oo V"fy (4.3.13) 
or, alternatively, 


fu—-s = fy — hs) ® > (—1)* ἢ Vify (4.3.14) 


In the form (4.3.14), 5 clearly is positive for interpolation. 

Extensive tables of the coefficient functions may be found in the literature 
(see references in Sec. 4.13). A brief table, for interpolation or extrapolation by 
tenths, through fifth differences, is included in Sec. 4.12 for desk calculation. 


4.4 Gaussian Formulas 


For interpolation at a point x, it is desirable to have available a formula in 
which the successively introduced ordinates correspond to abscissas which are 
as near as possible to x. If x is near one end of the tabulation, the newtonian 
formulas of the preceding section serve this purpose as well as is possible. 
Otherwise, it is convenient to start with the abscissa χρ nearest x, then to 
introduce x, and x_,, then x, and x_>, and so forth. 

If the ordinates are introduced in the order fo, ft, f-1./2,f—2 -+>> the 
result of replacing Xo, X1, X25 Χ3» Χά»... DY Xo, Χι» Χ- 1» Xa, X-29--- in 
- (2.5.2), and the subsequent use of (4.2.9) and (4.2.11), with k = 0, leads to the 
form 


f(x) =fo + & - xo) Ma + (x — χορ) — χα) at 
3 
+ (x - xo) — xe — x) SA 


δῖ 


rr 


+ (x — χρ)α — X1)(xX — X4)(xX — X2) 
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If we write 
ΕΒ, eee NO Χ = Xo + hs 
h 
this result takes the form 
a 1 2 12 
As = fo + 8 δῇ, + se! δῖ + Se Shy 
a gaNpe, “. 
+ s(s* — 1*)(s — 2) δ ἐξ ὧς 
4! 
7 412λ... 2 _ 42 a 
§ s(s 17)... (6΄ -- πὶ — 17) — m) δῆ, 
(2m)! 
or 
s(s* — 17)+++(s? — m?) coma 
}- a ____ “δ 
(2m + 1)! Sip 
+ £, 


where, if nth differences are retained, n = 2m when n is even and n 
when ἢ is odd. The error term takes the form 


2. 12λ...{.. 2 


when n = 2m, and the form 


᾿ς fp2mt+2 s(s? = 1’) s+ + (5? = m’\(s —m — 1) (2m+2) 
ar o>) ee ae 


(4.4.1) 


(4.4.2) 


= 2m + | 


(4.4.3) 


(4.4.4) 


when n = 2m + 1. This formula employs the forward zigzag difference path 


indicated in Fig. 4.7 and is known as Gauss’ forward formula. 


X_2 1. 
Of — 5,2 2 
X54 a4 a4 ‘i 
Of a9. : we OS 1/2 —~m 
T Xo eee? _-? Macs aa. 
ς 1/2 
ΧῚ fi of 
δ} 
X2 tr 


FIGURE 4.7 
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In a completely similar way, by introducing the ordinates in the sequence 
So». f-wSFisf-2fos-+-, using (4.2.10) and (4.2.11) with k = 0, and again 
introducing the abbreviation (4.4.1), we obtain the form 


+1 es 19 
fo=fo + ἐδ: + eee δὴ + Ὁ τ δι; 
s(s? -- 17\(s + 2 
mie eat δ foe 
A 5(52 — 17)-++(s? — m — 17)(s + m) 52m 
(2m)! : 
or 
s(s? — 12) --- (62 — m*) semtiy 
(2m + 1)! a 
aE. (4.4.5) 
where 
ε = pomer 506 -- 15).τ"6 = Mm) prams ney 
: (2m + 1)! 
or 


E, = h2m+2 s(s? — 17)-++(s? — m*)(s + m + 1) foam 2g) (4.4.6) 
(2m + 2)! 
according as the formula is terminated with even or odd differences. This 
formula utilizes the backward zigzag difference path in Fig. 4.7 and is known 
as Gauss’ backward formula. 

When terminated with an even difference, of order 2m, both formulas 
yield the polynomial agreeing with f(x) at Xo, X41,--->Xim and hence are 
completely equivalent in that case. However, when terminated with an odd 
difference, of order 2m + 1, the forward formula gives the polynomial agreeing 
with f(x) at Xo, X41,--->Xtm aNd X,41, Whereas the backward formula 
yields agreement at the first 2m + 1 points and at x_,,-,. In this latter case, 
when seeking /(X), the forward formula would be expected to afford somewhat 
better results when Χ is between x) and x,, whereas the backward formula 
would generally be preferred when Χ is between Xp and x_,. Neither of these 
formulas is of frequent practical use, but from them other more useful formulas 
may be derived. 


4.5 Stirling’s Formula 


When interpolations are to be effected for values of x near an interior point Xo, 
say, between x) — $4 and χρ + 4h, a formula of frequent use may be obtained 
by forming the mean of the gaussian forward and backward formulas, and so 
introducing a symmetry about the abscissa Χο: 
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5 


5.5: [δ - ἢ Ὁ 6 Ὁ ἡ] δ 


fs =fo+ 5 Of + Of_4)2) + 


2 42 
"ἢ So? δὴ + δ᾽...) 
+ tie 2D) ee 2.1 δ 3. 
Peg Ciscoe i alee eed 7 ery eee ye ar 


2 - (2m)! 
ΟΥ̓ 
ἜΝ 1: τ δος 2 ᾽ 2 = 
+ E, (4.5.1) 


It is then convenient to introduce symbols for the mean odd differences 
which appear in this formula. The notation 


= h _A 
f(x) = 5 / (= + 5) + s(x 5) (4.5.2) 


is often used, so that, for example, we may write 


μδίο = (δῇ,,2 + Of- 42) ud*fy = 67h j2 + δ᾽ 7.1.2) (4.5.3) 


With this notation for the so-called mean central differences of odd order, 
(4.5.1) takes the form 


2 ς .2 ἊΣ 1: 
Ss = fo τϑμδίο + 5 fo τ dh 
ss = 1°) 
s*(s? — 17)---(s? — m — 1?) .,, 
ἘΞ ( ) ( ) 52 to 
(2m)! 
or 
7 ae PS Cee eS 
s(s 1*) (s m ) gamely 
(2m + 1)! 
+ E, (4.5.4) 


The result of omitting EZ, is known as Stirling’s formula for interpolation and 
corresponds to the array of Fig. 4.8. 


| f- 1/2 δι. 
Τ Xo So —=|—> 6f, ἢ = 54f, - Ὁ 
5 δί,,2 δι, 
FIGURE 4.8 
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Since the errors associated with terminating (4.4.2) and (4.4.5) with an 
even difference are identical, there follows also 


_ games 52 — 12.:: 6 — mm) perme 
Ε, ae 99 


when n = 2m. As in the preceding cases, ὅ lies between the largest and smallest 
of the abscissas involved in the formula (here Xo, X44,.--, αν and x). 


However, when n = 2m + 1, the mean of the errors (4.4.4) and (4.4.6) 
takes the form 


__ pp2m+2 s(s? τ 12 a (7 ΝΣ η15) Ν τι (2πι- 2) 
E,=h 30m 4 DI [(s —m — if (¢1) 
+(stm-+ 1) f2"*é,)] (4.5.6) 


where both €, and €, lie inside the interval spanned by Xo, Χ: 1». .» ¥i¢n+1) 
and x. Thus, when Stirling’s formula is terminated with an odd difference, the 
error term does not take a simple form similar to (4.5.5). It should be noticed 
that the interpolation polynomial of degree 2m + 1, which is yielded by the 
formula in this case, agrees with f(x) at the 2m + 1 points Xo, X44,---, Xim 
but that an additional (2m + 2)th point of agreement (which would serve to 
specify the polynomial) is not known. 

The Stirling formula is equivalent to either Gauss formula when ter- 
minated with even differences. But even in this case its form is more convenient 
because of the fact that the coefficients of the differences of even order are even 
functions of s, whereas the coefficients of the mean differences of odd order are 
odd functions of s. A brief table of the coefficients is presented in Sec. 4.12. 
More extensive tables can be found in the literature (see references in Sec. 4.13). 


4.6 Bessel’s Formula 


Whereas Stirling’s formula is principally intended for interpolation near a 
tabular entry xo, the need frequently arises for a formula designed for inter- 
polation over the interval, say, between Xo and x,. In order to obtain a formula 
in which the array of differences involved is symmetric about a horizontal line 
midway between x, and x,, we again make use of the gaussian forward formula 


fe=fo +s fij2 + στ δῖ + 6 EF 8%; a (4.6.1) 


which involves the differences along the forward zigzag of Fig. 4.7, and combine 
it with a formula which involves the differences along the backward zigzag 
of that figure. The latter formula may be obtained most easily by noticing that, 
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if s were to be measured from x,, that formula would be obtained by advancing 
all subscripts in the gaussian backward formula by unity. Hence, if s is to be 
measured from x, in both formulas, we must advance the subscripts in (4.4.5) 
by unity and, at the same time, replace s by s — 1, to give the result 


a. — | — 2 
fe = fi + (8 — 1) Shira + SS 5%, + C= DAD 53, 4, 
(4.6.2) 
The mean of (4.6.1) and (4.6.2) then takes the form 
sis — 1 
fs = Ufij2 + ὦ — 4) Oty + -- ) μδϑ 
Ξοηζς —1 
Ἢ s(s "Ὁ 2) δ. eer 
s(s* — 17)-++(s? -- κι — 17)(s — m)_ «χη 
Η nn μδ' 1/2 
or 
εὐ τ) 76 = m = TNS — πὴ) ~ Ὁ gems 
(2m + 1)! "8 
+ &, (4.6.3) 


and is known as Bessel’s formula. The associated data array is indicated in 
Fig. 4.9. 


Xo fo Ofy “fy 


i b= Si2 “ΠΡ δὴ --- α 


FIGURE 4.9 Xf; δῇ, δέῃ 


When terminated with an odd difference, of order 2m + 1, both (4.6.1) 
and (4.6.2) yield the polynomial of degree 2m + 1 agreeing with f(x) when 
X = Xo, Xt1>++->Xim and X,,4;. Hence the same statement applies to 
Bessel’s formula, and the error in that case is consequently identical with (4.4.4) 


E. = pemt2 S87 — 17) +++ (8? — ms — m — 1) pms) 
ΠΕ τὸ fOm*2E) (6.4) 


when n = 2m + 1. However, when Bessel’s formula is terminated with even 
differences the error term is obtained as the mean of (4.4.3) and the first form 
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of (4.4.6) with s replaced by s — 1, noticing that the parameter ¢ is not generally 
the same in the two expressions, in the less simple form 


) ὡς μ2πιῈ1 s(s? — 12): "(ἢ —m -- 1*\(s -- m) (2m+1) 
E, τ ἢ πὰ ΨΚ [(s + m)f (C1) 


+(s—m—fer"rrE,)] (46.) 


when n = 2m, where €, and é, lie inside the interval spanned by Xo, Χ 1... - -» 
X+m> Xm+1, and x. 

A brief table of the coefficients appearing in (4.6.3) is given in Sec. 4.12. 
More extensive tables are available in the literature (see Sec. 4.13). The sym- 
metry of this formula about the midpoint of the interval (x, x,) becomes more 
evident if we write s = ¢ + 4, so that 


x — 4% + Xi) _ 


h 


t= s—4 (4.6.6) 


and hence t is distance measured from that midpoint in units of ἡ. It is readily 
verified that Bessel’s formula then takes the equivalent form 


?7—-i 2 (12 — $) «3 
fisajy = Miya + ἰδῇ + 5 μδ'ξι,2 + 3) δ᾽ ἢ 2 
4 (t on 4) μδέξι: ep t(t τ 4) δ᾽, : Ἧς (4.6.7) 


where the terminating term and the corresponding error term are obtainable 
by introducing (4.6.6) into the forms given in (4.6.3) to (4.6.5). Thus we see 
that the coefficients of mean even differences are even functions of t, whereas 
the coefficients of odd differences are odd functions of ἢ. 

An important special case results, by setting s = 4 in (4.6.3) or ¢ = Oin 
(4.6.7), in the form 


fiyn = 40 + fi) - τεί(δ + δ) + x3e(5*fo + δ) 
: — zors(5°fo + OA) + °°: 


,.ώ[1.3:.6 eo] ae " 
εὐ be fy + Oh) + Ey (4.68) 


where 


Byp = (-y Ben Oe fom) (469) 


and where x) — mh < ἔξ < x, + mh. This formula is known as the formula 
for interpolating to halves and is particularly useful in subtabulation of data. 
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4.7 Everett’s Formulas 


In many tabulations, auxiliary tables of central differences of even orders 
(usually 6°f and δ΄) are provided. In order to obtain an interpolation formula 
which involves only central differences of even order, we may, for example, 
Start with the Gauss forward formula, terminated with an odd difference and 
written in the form 


f= Uo + 8 Bf) + SO (6% + ἘΞ τ a) 
2 .- 42 = 
ὩΣ s(s 1 Ms 2) ω oe S + 2 5h) foes. 
4! 5 
s(s? — 12) +++(s? — m — Τὺ( — πὴ (op stm 0) 
+. | Of, 4 fm ti 
(2m)! Jo 2m + 1 7112 
+ E, (4.7.1) 
where 
7 — 17)+++(s? -- m*\(s — κι — 1) 
E, = h2m+2 sis) τ VW) ++ = ms -- m -- 1) ἐπι) 4.7.2 
ce FOm*E) (4172) 
If we now make use of the relations 
δ΄,,, =hi — fo δ᾽, = Of, -- Hf, ee (4.7.3) 
this formula becomes 
4-Ξᾳ -- 9 — CADE =) 524, 
+1 — 17) — 2) -- 3 
_ B+ Dols = ὴς = 2s = 3) sap _ 
5! 
(s + m — 1)(s + m — 2)-+-(s — κι — 1) 4 
τ ὃ "To 
(2m + 1)! 
ny ae CF OSD δ 
+ CA DEF DSS - WE = 2) guy joka 
(s + m\(s + m — 1)-++(s — M) .» 
++ on™ E i 
(2m + 1)! fi + & at 


where E, is given by (4.7.2). The interpolation formula resulting from neglect 
of E, is known as Everett’s first formula (or often merely as Everett’s formula). 

In place of using the differences μδξ, j2 and δέ τ τς j2 Which are present 
in Bessel’s formula, it uses the differences 5””f, and 67"f,. However, it is seen 
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that the result of terminating Bessel’s formula with the (2m + 1)th difference 
must give a result identical with that of terminating Everett’s first formula with 
the two (2m)th differences, for both of these formulas are equivalent to the Gauss 
forward formula terminated with the (2m + 1)th difference, as may be verified 
directly by comparison of the error terms. 

Whereas the same number of terms must be evaluated in using the two 
formulas just mentioned, if tables are available which include differences of 
orders, say, two and four, then the use of the Everett formula permits a cal- 
culation taking into account all differences through the fifth without the need 
of differencing on the part of the computer. A brief table of the coefficients is 
provided in Sec. 4.12 (see references in Sec. 4.13 for more elaborate tables). 

In a similar way, a formula involving only differences of odd order can 
be obtained from the Gauss forward formula terminated with an even difference 
(see Prob. 19). The result is known as Everett’s second formula (often also as 
Steffensen’s formula), but it has not found much favor in practice. The result of 
terminating it with the (2m + 1)th differences is equivalent to that of terminating 
Stirling’s formula with the (2m + 2)th difference. A brief table of its coefficients 
is provided in Sec. 4.12. 

If we introduce the notation of (4.3.10), Everett’s first formula can be 
put in the form 


ΤΣ ΝΟΣ 


+ δῇ. + (’ ᾿ ᾿ &°f, + ( ᾿ ἢ δ ἢν ἘΠῚ (4.7.5) 


so that the coefficients of one line are obtained by replacing s by 1 — sin those 
of the other line. 


4.8 Use of Interpolation Formulas 


In general terms (and, for example, in relation to digital computers), the Newton 
forward- and backward-difference formulas (and their derivatives, integrals, 
and other relatives) are of importance for the purpose of generating and studying 
sequences of approximations to functions (and to their derivatives, integrals, 
and other properties) which correspond to collocation at successive sets of 
equally spaced points, each set including the preceding one, when new points are 
to be introduced in always increasing or always decreasing algebraic order. 
The central-difference formulas of Stirling and Bessel permit the generation 
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and study of such sequences when successive new points of collocation are to 
be introduced on both sides of a reference point or tabular interval of interest 
at increasing distances therefrom. 

In each of these situations, when information has been obtained relative 
to the number of collocation points needed for a certain purpose, it may be that 
the actual calculation is most efficiently effected by means of formulas of 
lagrangian type, particularly when a digital computer is to be used for that 
purpose. 

Whereas a similar end result is provided without the use of differences 
by recursive processes such as Aitken interpolation and Romberg integration, 
the use of an appropriate difference formula may afford information with 
respect to a whole set of numerical calculations; otherwise each separate com- 
putation will require its own recursion. When only a single computation is 
to be made, or when available analytical information permits the use of an 
error term for the purpose of determining how much data will suffice, clearly 
the preceding considerations are of less importance. 

In relation to desk calculation (for example, the use of slide rule or desk 
calculator), the difference formulas usually are convenient, not only for pre- 
liminary error estimation, but also for actual computation. The remainder of 
this section is principally oriented in this direction. 

As mentioned earlier, the Newton formulas with forward or backward 
differences are most appropriate for calculation near the beginning or end, 
respectively, of a tabulation, and their use is mainly restricted to such situations. 
The Gauss forward and backward formulas terminated with an even difference 
are equivalent to each other and to the Stirling formula terminated with the 
same difference. The Gauss forward formula terminated with an odd difference 
is equivalent to the Bessel formula terminated with the same difference. The 
Gauss backward formula launched from x9, and terminating with an odd 
difference, is equivalent to the Bessel formula launched from x_ 1, terminated 
with the same difference. Thus, in place of using a Gauss formula, one may 
always use an equivalent Stirling or Bessel formula, for which the coefficients 
are extensively tabulated. 

Reference to (4.5.4) shows that the coefficients of all differences of even 
order in Stirling’s formula involve s? as a factor. Thus, for interpolation near 
Xo, it may be expected that the result of terminating that formula with a mean 
odd difference p5°"* "fy will be nearly as accurate, on the average, as the 
result of retaining one additional difference. However, the relative complexity 
of the remainder term (4.5.6) in that situation is somewhat of a disadvantage 
when a precise error bound is required. 
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A comparison of (4.5.6) and (4.6.4) shows that, in addition to common 
factors, the Stirling error involves the factor 


$[(m + 1 + s)f mt.) — m + 1 -- s)\fO™*2E,)] 


whereas the Bessel error involves the factor 


~ =m ἘΠ -- 8) fO"*%E,) 
If it is known only that | f?"*”(x)| Ξ M for x_m-1 Ξ χ 3 X%m4+1, the Stirling 
factor can be guaranteed only not to exceed (m + 1)M in magnitude, whereas 
the Bessel factor cannot exceed (m + 1 — s)M in magnitude, if extrapolation 
is excluded. Thus, from the point of view of predictable error bounds, Bessel’s 
formula actually displays a slight advantage when the highest difference to be 
retained is odd, in spite of the fact that Stirling’s formula then makes use of 
information afforded by an additional ordinate.t In any case, the Stirling 
formula is most efficient (in general) for small 5, say, for —} Ξ s S 4, that is, 
for calculation between x) — h/4 and xg + h/4. 

A similar comparison of (4.5.5) and (4.6.5) indicates that, whereas the 
result of truncating the Bessel formula with a mean even difference makes use 
of more information than does the Stirling formula truncated with the cor- 
responding ordinary even difference, the use of the latter formula may actually 
be slightly preferable from the point of view of predictable error bounds when 
the highest difference to be retained is even. In any case, the Bessel formula 
is most efficient (in general) near s = 4, say, for} Ξ s Ξ 3, that is, for calcula- 
tion between x) + h/4 and x, — h/4. 

In a series of calculations based on a given set of data, it is inconvenient 
to shift from one of these two formulas to the other, and one of the two must be 
chosen. Given a set of data, a decision might be made first as to the highest 
difference which was to be retained. If that difference were of even order, 
Stirling’s formula perhaps would be recommended; if it were of odd order, 
Bessel’s formula might be preferred.{ However, the difference in accuracy 
between the two formulas is usually small, so that the choice is usually dictated 
by personal preference. 

Everett’s first formula is particularly useful when auxiliary tables of certain 
even differences accompany the given data; Everett’s second formula would 
be useful if its coefficients were tabulated and if auxiliary odd differences were 
available. 


+ If it is known (for example) that f(?"* (x) is of constant sign in the relevant interval, 
the advantage clearly is generally reversed. 

t The fallibility of such generalizations is illustrated by a comparison of the results of 
Probs. 22 and 23. 


FINITE-DIFFERENCE INTERPOLATION 147 


Many tables of special functions exist, with certain even differences 
provided, in which the entries are determined to a large number of places, but 
in which a large uniform spacing ἢ is used. Such tables store tremendous 
amounts of potential data in volumes of modest size for the user who employs 
Everett interpolation, retaining a number of differences (and a corresponding 
number of significant digits) consistent with his error tolerance. In addition, 
similar comments apply to the storage of special functions in a digital computer, 
a relatively small data storage sufficing for the specification of the function over 
a large range, in association with the use of an Everett interpolation routine. 
Further economizations in table formation or data storage can be effected by 
the use of techniques to be considered in Sec. 4.10.7 

To illustrate the use of the various formulas, we consider the following 
difference table, based on five-place data taken from a table of f(x) = sin x, 
where the differences are given in units of the fifth place and where the figures 
in parentheses are auxiliary mean central differences used in the calculations 
to be described: 


χ f(x) Af ΔΖ A*f ΔΓ = ASF 
1.0 0.84147 
4974 
1.1 0.89121 — 891 
4083 — 40 
1.2 0.93204 (3617.5) — 931 (— 36) 8 
(0.94780) 3152 (-947) —32 (9) 2 
1.3 0.96356 — 963 10 
2189 —22 1 
1.4 0.98545 — 985 11 
. 1204 --Ἠ[ «11 —3 
1.5 0.99749 — 996 8 
208 —3 4 
1.6 0.99957 — 999 12 
-- 791 9 
1.7 0.99166 — 990 
— 1781 
1.8 0.97385 


A convenient check on the differencing effected in any difference table 
consists of the fact that the sum of the entries in any column of differences should 
equal the difference between the last and first entries of the preceding column. 
To see that this is so, suppose that the entries in a certain column, reading 
downward, are u,, 129» u3,..., u,. Then the corresponding entries in the next 
column to the right are (μ2 — u,), (U3 — uo),..., (u,_; — Μ,,--2)» and 


+ Frequently used functions on specific intervals of more modest extent are more often 
stored in computers by means of coefficients specifying other types of approximations 
(to be considered in Chap. 9). 
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(u, — u,_,), and the sum of these quantities evidently “telescopes” into 
u, — Uy. 

Because of the irregular fluctuation of the fifth differences in the given 
table, we would suppose that they are not significant but that they principally 
reflect the propagated effects of roundoff errors present in the given data (see 
also Sec. 4.9). In fact, it would be suspected that the fluctuation of the fourth 
differences about their mean value of about 10 is also principally due to these 
inherent errors in the original data. Thus, not more than the first four differences 
are to be used here. Whether these differences are sufficient, and whether they 
are all needed, could be determined from the error term associated with the 
formula to be used if knowledge of the analytical form of f(x) were presumed. 

In order to interpolate for f(1.02), we would use Newton’s forward- 
difference formula, with s = 0.2: 


(1.02) » 0.84147 + 0.2(0.04974) + eae (—0.00891) 


᾿- (0.2)(-08X-1.8 (—0,00040) 


i eS Ὁ 59) (0.00008) 


= 0.84147 + 0.009948 + 0.000713 — 0.000019 — 0.000003 
= 0.852109 = 0.85211 


which is correct to five places. The calculation is considerably simplified if use 
is made of coefficient tables (see Sec. 4.12): 


f (1.02) % 0.84147 + (0.2)(0.04974) + (—0.08)(— 0.00891) 
+ (0.048)(— 0.00040) + (—0.0336)(0.00008) 


= 0.85211 


The interpolation for (1.75) would be accomplished by use of Newton’s 
backward-difference formula (using coefficient tables), with s = —0.5: 


f(1.75) & 0.97385 + (—0.5)(—0.01781) + (—0.025)(—0.00990) 
+ (—0.0625)(0.00009) + (—0.03906)(0.00012) 


= 0.98398 


the rounded value being in error by defect of one unit in the fifth place. 

In order to interpolate for (1.22), we could use either Stirling’s formula 
or Bessel’s formula, with x) = 1.2 and s = 0.2 in either case. Since the formula 
is to terminate with even differences (and also since the interpolant is nearer 
s = Ο than s = 4), Stirling’s formula might be preferred. After inserting the 
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mean odd differences indicated in parentheses in the row x = 1.2 of the difference 
table, the use of Stirling’s formula gives 


Ff (1.22) = 0.93204 + (0.2)(0.036175) + (0.02)(—0.00931) 
+ (—0.032)(—0.00036) + (—0.0016)(0.00008) 


= 0.93910 


whereas, after inserting appropriate mean even differences in the table, the use 
of Bessel’s formula gives 


f (1.22) = 0.94780 + (—0.3)(0.03152) + (—0.08)(—0,00947) 
+ (0.008)(—0.00032) + (0.0144)(0.00009) 


= 0.93910 


Both results are correct to five places. We see that both formulas would in fact 
give results correct to five places if only third differences were retained. 

In a table providing δ and δ΄, the entries used in the interpolation for 
x = 1.22 by Everett’s first formula would read - 


x fx) δ2 διῇ 
1.2 0.93204 — 931 8 
1.3 0.96356 --963 10 


and the calculation would be of the form 


(1.22) ~ 0.8(0.93204) + (—0.048)(—0.00931) + (0.00806)(0.00008) 
+ 0.2(0.96356) + (—0.032)(—0.00963) + (0.00634)(0.00010) 


= 0.745632 + 0.000447 + 0.000001 
+ 0.192712 + 0.000308 + 0.000001 


= 0.939101 = 0.93910 


in agreement with the preceding results. The additional computation here is 
because of the fact that Everett’s formula with fourth differences actually 
incorporates the effects of the first five differences. In this case, it is seen that 
the retention of only the two second differences would have been sufficient. 

Since the analytical expression for f(x) is known, this last situation could 
have been predicted by reference to the error formula (4.7.2) which, withh = 0.1, 
s = 0.2, and m = 1, gives 


E = 19-4 0:2 0.96)(—1.8) 
24 


1.44 x τ0- "ὃ 


Since here ΚΟ) = sin x, there follows If < 1, so that (if no roundoff 


f™) 


Ι 
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errors were present) the error resulting from terminating Everett’s first formula 
with second differences would be less than two units in the sixth place. Similar 
error estimates could have been obtained, in advance, with reference to the other 
calculations. 

Formulas for numerical differentiation and integration may be obtained 
by differentiating and integrating any of the interpolation formulas (see Probs. 
5, 13, and 16). However, these formulas can be obtained somewhat more 
systematically by operational methods, and their treatment accordingly is 
postponed to the following chapter. 


4.9 Propagation of Inherent Errors 


In addition to the truncation errors, for which certain analytical expressions 
have been given, the effects of roundoff errors in the given data, and in the 
computation, must be taken into account. The latter generally can be regulated 
by retaining one or more extra figures in the intermediate calculation. It thus 
remains to investigate the way in which roundoff errors in the given data affect 
the interpolation process. 

The error in the interpolant, corresponding to such inherent errors, clearly 
is merely a linear combination of the errors in the ordinates involved in the 
interpolation. When the interpolation polynomial is of degree n and is deter- 
mined by exact fit to the given data atn + 1 points, the constants of combination 
are the Lagrange coefficients considered in Chap. 3. In particular, if the error 
in each given ordinate cannot exceed ¢, then the error in the interpolant cannot 
exceed the product of 8 and the sum of the absolute values of the relevant 
Lagrange coefficients. Three-point coefficients, corresponding to retention of 
second differences, are tabulated to tenths in Sec. 3.4, whereas a similar table of 
five-point coefficients, corresponding to retention of fourth differences, is 
presented in Sec. 4.12. Use of the latter table shows, for example, that the error 
in an interpolation at an abscissa midway between the third and fourth of the 
five relevant abscissas, due only to data errors not exceeding e in magnitude, 
cannot exceed 1.48 in magnitude. 

The Stirling formula, when terminated with a mean odd difference, and 
the Bessel formula, when terminated with a mean even difference, are not 
based on interpolation polynomials which fit the data at n + 1 points, and 
hence must be analyzed separately. This is an additional reason for avoiding 
the termination of the Stirling and Bessel formulas with mean differences, when 
precise error bounds are desired. 

The presence of roundoff errors in given data is also of importance in 
connection with the question as to the number of differences which should be 
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retained in an interpolation. For this reason, it is of interest to study the 
propagation of the effects of such errors into the differences themselves. 

Suppose first that a single initial entry is in error by an excess e due 
perhaps to rounding. Then, if all other initial entries are assumed to be exact, 
it is seen that the effects of this error will be propagated into the first five dif- 
ferences of the difference table as follows: 


f Af A’f A*f A*f ΔΗ 


--- 6 — 56 

-- 6 -- 46 
6 —3e 10e 

e —2e 6e 
- ὁ 36 -- 106 

_— δ -- 46 
- —e 56 


This characteristic distribution along a column, in which the successive 
errors alternate in sign and, indeed, vary along the column of rth differences 
as the binomial coefficients associated with (1 — x)’, frequently serves to permit 
one to discover and correct a gross error in a table. 


a Af A*f Δδϑ 
1.203 18 

221 18 
1.424 36 

257 18 
1.681 54 

311 22 
1.992 76 

387 6 
2.379 82 

469 30 
2.848 112 

581 14 
3.429 126 

707 18 
4.136 144 


Thus, for example, the third differences in the accompanying table appear 
to fluctuate irregularly. Their mean value is 18, and the successive deviations 
from the mean, reading downward, are 0, 0, 4, —12, +12, —4, 0. Thus an 
excess e = 4 in the last place is indicated in the entry 2.379, which occupies 
the row separating the maximum deviations. The corrected value is 2.375. 
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A fourth differencing would have given the entries 0, 4, —16, 24, — 16, 4, from 
which the same conclusion would be drawn. When several errors are present, 
their discovery may be much more difficult. 

Suppose now that ail initial entries may be in error by amounts between 
—eande. The most unfavorable situation, with regard to effects on differences, 
is that in which the successive errors are as large as possible but are of alternating 
sign. The error-propagation table, through fourth differences, then appears as 
follows: 


f Af Af A3f A‘f 

e —4e 16e 
—2e 8e 

—e 4e — 16e 
2e -- 86 

6 —4e 16e 
—2e 8e 

—e 4e — 16e 
2e — 8e 

6 —4e 16e 


Thus, it follows that errors varying between —e and e in the initial data will 
lead to errors varying between —2’e and 2’e in the rth differences. Here, 
for example, if the initial data are correctly rounded to k decimal places, 
e=5x 107%}, 

Because of this possible error growth, it usually happens in practice 
that calculated differences beyond a certain order are no longer significant. 
That is, there exists a certain “noise level” such that the effects of initial round- 
offs are of the same order of magnitude as the differences which would have been 
obtained had the initial data been exact. If the initial data are rounded, from 
exactly known data, to k decimal places, then roundoff errors of magnitude 
2"-1/10* are possible in the rth differences. Hence rth differences of magnitude 
appreciably smaller than 2’~1/10* are likely to consist largely of “noise.” 
Thus, since kK = 5 in the data used for the examples of the preceding section, 
noise of magnitude 1, 2, 4, 8, and 16 units in the fifth place could occur in the 
respective differences of order one through five, although the probability of 
noise of nearly maximum magnitude in the rth difference is clearly small and 
will decrease rapidly as r increases. In any case, it would be expected that, since 
the fifth differences in that table are small relative to the permissible noise, 
they are meaningless, so that the fluctuation of the fourth differences about their 
mean may also lack significance, in the sense that the replacement of those 
differences by their mean value would lead to errors in interpolation of the 
same order as the errors which are present in the given data. 
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4.10 Throwback Techniques 


A useful procedure, due to Comrie [1956], frequently permits a table user, 
table maker, or computer program effectively to take into account a neglected 
difference by modifying certain of the differences actually retained in an inter- 
polation formula. 

In illustration, Everett’s first formula, terminated with fourth differences, 
can be expressed in the form 


fw (1 sife εν - C= 96 τ 9| 0065 56 - 9 δύ] 


+ oe (δέῃ + στ’ - 2 | (4.10.1) 


On the interval Ὁ < s < 1, the factors 


(s + 1)(s — 3) per (s + 2)(s — 2) 
20 20 
both vary only from — 3; to —;. This fact suggests that these factors be 
replaced by a constant value over that interval in (4.10.1). The value suggested 


by Comrie, —0.184, differs only slightly from the mean value (—22) of each 
factor.| Hence, if we define the modified second difference 


5f, = 5f, — 0.184 δῇ, (4.10.2) 


Everett’s formula with fourth differences may be approximated by the formula 
fx 1 -- δ) + si -- eae δὲ + ceive) δ, (4.10.3) 


It is conventional to speak of (4.10.2) as “throwing back the fourth difference 
on the second.” 
The error associated with the introduction of this approximation is given by 


s+1 δ 4 5-Ὁ 2 51 4 
-|( 5 ) + o1se(s) [av + |( ᾿ ) + o184/ : lov (4.10.4) 


and calculation shows that, when 0 < s < 1, this sum is not larger in magnitude 
than 0.00122M, where M is the larger of [δ] and |5*f,|. Hence, if 5+f, and 
6*f, do not exceed 400 units in the last decimal place retained in the overall 


t The figure (J 2 + 3)/24 = 0.184 was obtained by Comrie as that value of the factor 
for which the magnitude of the maximum error due to throwback is least in the case 
of Bessel’s formula. The same figure is conventionally used with the Everett formula 
(see also Probs. 27 and 28). 
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calculation, the error committed in the throwback cannot exceed 4 unit in that 
place; if δ and 5*f, are of common sign, then 1,000 units are permissible 
(see Prob. 28). 

In such situations, only the associated modified second difference 67f, 
need be tabulated (or stored, in a computer) together with each ordinate f,, 
and Everett interpolation based on only those data effectively takes into account 
all differences through the fifth. 

If the same throwback (4.10.2) is effected in Bessel’s formula, so that 
pd*f,,2 is replaced by μι, and differences beyond the third are omitted, 
the effect of the omitted fourth difference is properly taken into account, in 
the same sense, if it does not exceed 1,000 units in the last decimal place retained. 

Generalized techniques, relevant to higher differences or to two or more 
differences, have also been devised by Comrie [1956]. Similar techniques have 
been given (see Lidstone [1943]) for Stirling’s formula. 


4.11 Interpolation Series 


Reference has already been made to the fact that the formal infinite series 
generated by the interpolation formulas considered in this chapter generally 
do not converge as the number n of differences retained is increased without 
limit while the spacing ἢ is held fixed. In this section we consider a simple 
example which illustrates this fact, and we state certain known results of a more 
general nature. 

If the Newton forward-difference formula (4.3.2) (with the error term 
suppressed) is considered as a formal infinite series, it can be expressed in the 
form 


fx) ~ ~ £0) + x TO + x(x — oO a 


~ f(0) + S ar x(x — μ) 5: (ὦ — k — th) (4.11.1) 


where we have supposed that the origin has been chosen such that x9 = 0. 
Similarly, the formal Stirling interpolation series is expressible in the form 


x Χο 
ΓΟ ~ fO) +2 Ex +X f0) i 


~ 1 2k-1 2k 
“10+ > Ea i |! Pf) + FO | 


x [χα — h?)(x? — 45)": ψω — k — 1*h?)] (4.11.2) 


FINITE-DIFFERENCE INTERPOLATION [55 


and the remaining formulas correspond to similar interpolation series. The 
basic problem considered here is that of determining when such a series actually 
converges to the generating function /(x). 

In the special case when 


F(x) =e* (4.1.3) 
there follows 
Af (x) = τὴ — e* = (ὁ — 1) f(x) 
A’f(x) = (6" — 1YF(x) 
and hence also 
ΔΓ) = (ο"" — 1 (4.11.4) 


Thus the formal Newton interpolation series for e** may be obtained in the 
form 


e~~ 1+ a be —h)---(«—k-—1h)] (4.11.5) 


This series terminates, and represents e** correctly, when x is zero or a 
positive integral multiple of 4. In order to investigate its convergence when the 
series is infinite, we notice first that the ratio of successive terms in this series is 
given by 


h(k + 1) 


and that, as k > oo, this ratio tends to —(e” — 1), for all values of x. Thus 

we may deduce that the series (4.11.5) converges if |e — 1] < 1 or e” < 2, 

whereas, if x 4 0, ἢ, 2h,..., it diverges when [66 — 1| > 1 or οἷ > 2. 
When e* = 2, the series reduces to 


— x(x — h)+++(x — k — th) 
ap? ΚΙ᾿ 


“πὰ κὰν} 


the successive terms of which alternate in sign when k is sufficiently large. 
Now the Ath term can be written, in terms of the gamma function, in the form 


(—x/h)k! 
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By making use of the fact that Γ(Κ + u) is approximated by k! k*~', for 
large k, we find that this last ratio is approximated by 


(οὐ 5 
I'(—x/h) 
when k is large, and hence that it tends to zero as k — oo ifand onlyifx > —A. 

It follows that the series (4.11.5) converges for all finite values of x if 
ah < log 2 and diverges for all values of x which differ from 0, ἢ, 2h,... if 
ah > log 2, and that, if ah = log 2, the series converges when and only when 
x > —h. It can be proved that, when the series converges, the convergence is 
indeed to e™. 

If x is replaced by a complex variable z (but a is real), the preceding 
developments are unchanged except for the fact that, when ah = log 2, the 
region of convergence is that half of the complex z plane for which the real 
part of z is greater than —h. If also a is complex, the conditions ah S log 2 
must be replaced by |e” — 1| S 1 

We see, therefore, that, if the Newton forward-difference formula were 
used for interpolating e** where a > 0, with a spacing ἢ larger than (log 2)/a, 
the successive interpolates corresponding to the incorporation of more and more 
data eventually would begin to oscillate with increasing amplitude about the 
true value. Thus, whereas the retention of an additional term of the interpolation 
formula generally would improve the accuracy of the interpolation up to a 
certain stage, there would exist a point beyond which the addition of more terms 
would correspond to a Joss of accuracy, even though no roundoff errors were 
present. (A similar situation was encountered in Sec. 3.9.) For ah < log 2 
(in particular, for negative a), this situation would not arise. In the intermediate 
case when ah = log 2, convergence would follow if and only if the formula 
were not used for backward extrapolation beyond x = —h. These results are 
particularly remarkable in view of the fact that e” is such a well-behaved function 
that its Taylor series, launched from any point, converges for all finite values of z. 

A similar analysis, in the case when Stirling’s formula is used instead, 
leads to the fact that here the corresponding interpolation series converges for 


x/h—-1 


all z when ah < 2log (1 + J2 2) and diverges (when it does not terminate) 
for all z in all other cases, including the special case when ah = 2 log (1 + “2 2). 
If a is complex, the corresponding condition is |sinh (ah/2)| « 1. 

In the general case, it is known that, if the Stirling series converges for 
any value of z in addition to z = 0, +A, +2h,... (for which it terminates), 
then it converges for all finite values of z (real or complex). Hence, conversely, 
if it diverges for any finite value of z, it diverges always unless it terminates. 
In the language of the theory of anlaytic functions of a complex variable, the 
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Stirling series cannot converge to f(z) for any z (except those for which it 
terminates) unless f(z) is a so-called entire function, that is, a function which is 
analytic at all finite points of the complex z plane, or, equivalently, a function 
whose Taylor series converge everywhere. 

But, even though this be the case, the series stil] may not converge (as in 
the preceding example, where f(z) = 657). It is also necessary that | f(re®)| < 
Me™ for large r, where M and a are constants and a is such that ah < π. If 
these conditions are not satisfied, the Stirling series will diverge everywhere 
except where it terminates. On the other hand, if f(z) is an entire function, 
and if | f(re’’)| < Me™ for large r, where M and a are constants and a is such 


that ah < 2 log (1 + a 2), then the series will converge everywhere. Similar 
statements apply to the series associated with Bessel’s formula. 

In the case of the Newton series, with forward differences, it is known 
that, if the series converges to f(z) for any value of z, say, Z, in addition to 
z = 0, h, 2h, ..., for which it terminates, then it converges for all values of z 
such that the real part of z is greater than the real part of Z. Unless F(z) is 
analytic in some right half-plane Re (2) > a, and also |f(re®)| < Me” for 
large r, where M and a are constants and a is such that ah < π|2, the series will 
diverge except when it terminates. If f(z) is analytic in such a half-plane and if 
also | f(re”®)| < Me” for large r, where M and a are constants and a is such that 
ah < log 2, then the series will converge everywhere in that half-plane. (For 
proofs of these statements, see Nérlund [1926, 1954].) 

Thus, for example, the function f(z) = 1/(1 + z?) is analytic when z 
is real, but it possesses poles when z = +i. Hence the Stirling series will 
diverge when it does not terminate. Since this function is analytic in the right 
half-plane z > 0, and since it is dominated in magnitude by any exponential 
function Me” (a > 0) as r > οὐ, the Newton series will converge in that half- 
plane. Nevertheless, if both series are launched from the same point, the error 
in the Stirling series will at first decrease much more rapidly than that associated 
with the Newton series, as additional terms are incorporated into the calculation. 
Eventually, the result of adding still more terms to the Stirling series will increase 
its error, whereas the error in the Newton series will continue to decrease. 
However, it is of considerable practical importance that, when ἡ is small, this 
point of diminishing return in the Stirling formula is likely to be preceded 
either by a stage at which the truncation error has decreased below the tolerance 
imposed or by a stage at which the “‘noise level’ is reached, so that the effects 
of roundoff errors would cause the remaining higher differences to be un- 
dependable in any case. 

Thus, as in many other practical situations, it is quite possible to obtain 
more accurate results by terminating an ultimately divergent process at an 
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appropriate stage than by terminating a convergent process at a corresponding 
stage. | 

It is evident that since each partial sum of either the Newton or Stirling 
series represents a polynomial approximation to f(x) corresponding to colloca- 
tion at the points involved, the two sequences of approximations differ only 
in that the former results from the successive introduction of the ordinates at 


x = 0,h, 2h,..., kh,..., all of which lie on the half-line Ὁ < x < oo, whereas 
the latter ἐν εδι ἐν. ἍΝ ἫΝ the ordinates at the points ---, —kh,..., 
—h,0,h,..., kh,..., in such a way that symmetry is preserved about x = 0. 


That is, the convergence or divergence of the sequence of approximations truly 
depends upon the sequence of data introduced rather than upon the form in 
which the polynomial interpolation formula employed is written. Whereas an 
indication of the existence of an unfavorable situation usually is afforded by an 
inspection of a relevant difference table, such numerical evidence is not available 
when lagrangian methods are used. 

The sequences of interpolation polynomials considered here correspond to 
the incorporation of successive ordinates which eventually are at unboundedly 
increasing distances from the point of interpolation. Whereas the sequence 
generated by fitting ordinates at points which divide a fixed finite interval 
[a, b] into n equal parts and allowing ἡ to increase without limit is more 
tractable, there actually exist functions which are continuous on [a, δ] but 
for which this sequence diverges everywhere inside that interval. For the function 
f(x) = 1/(x? + 1) on the interval [—5, 5], Runge [1901] established diver- 
gence when |x| > c, where c = 3.63. For the simple function f(x) = |x| on 
[—1, 1], Bernstein [1912b] proved that the sequence diverges for all x in 
[—1, 1] except x = 0 and x = +1. However, when the function f(z) is an 
analytic function of the complex variable z in a sufficiently large region including 
the real interval a < x < b under consideration, convergence is guaranteed 
(see Probs. 38 to 43). 

Indeed, it is known (see Krylov [1962]) that if f(z) is analytic in the region 
8 comprising the points which lie on or inside one or both of the circles with 
centers at z = a and z = ὁ and radius ὁ — a, then any infinite collocative 
interpolation sequence will converge to f(x) everywhere on [a, δ] regardless of 
the distribution of the successive points of collocation, provided only that they 
all lie in [a, b] and that their number becomes infinite. 


4.12 Tables of Interpolation Coefficients 


This section provides brief tables of coefficients relevant to the interpolation 
formulas which have been considered. For more elaborate tables, the references 
cited in Sec. 4.13 should be consulted. 
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LAGRANGE FIVE-POINT INTERPOLATION? 


fy 1. γ,(5).2 + Ls(s)f-1 + Lols)fo + ΖΘ) + Lols)fp 
(for negative s, use lower column labels) 


5 1.36) 1,...6) 1096) 1.065) L4(s) 
Se ee «- ππτορς - δὰ 
0.0 0.000000 0.000000 1.000000 0.000000 0.000000 0.0 
0.1 0.007838 — 0.059850 0.987525 0.073150 — 0.008663 —0.1 
0.2 0.014400 — 0.105600 0.950400 0.158400 — 0.017600 —0.2 
0.3 0.019338 — 0.136850 0.889525 0.254150 — 0.026163 — 0.3 
0.4 0.022400 — 0.153600 0.806400 0.358400 — 0.033600 —0.4 
0.5 0.023438 — 0.156250 0.703125 0.468750 — 0.039063 —0.5 
0.6 0.022400 — 0.145600 0.582400 0.582400 — 0.041600 —0.6 
0.7 0.019338 — 0.122850 0.447525 0.696150 — 0.040163 —0.7 
0.8 0.014400 — 0.089600 0.302400 0.806400 — 0.033600 —0.8 
0.9 0.007838 — 0.047850 0.151525 0.909150 — 0.020663 —0.9 
1.0 0.000000 0.000000 0.000000 1.000000 0.000000 — 1.0 
1.1 — 0.008663 0.051150 - 0.146475 1.074150 0.029838 —1.1 
1.2 — 0.017600 0.102400 — 0.281600 1.126400 0.070400 —1.2 
1.3 — 0.026163 0.150150 — 0.398475 1.151150 0.123338 —1.3 
1.4 — 0.033600 0.190400 — 0.489600 1.142400 0.190400 —1.4 
1.5 — 0.039063 0.218750 — 0.546875 1.093750 0.273438 —1.5 
1.6 — 0.041600 0.230400 —0.561600 0.998400 0.374400 — 1.6 
1.7 — 0.040163 0.220150 — 0.524475 0.849150 0.495338 —1.7 
1.8 — 0.033600 0.182400 — 0.425600 0.638400 0.638400 —1.8 
1.9 — 0.020663 0.111150 — 0.254475 0.358150 0.805838 —1.9 
2.0 0.000000 0.000000 0.000000 0.000000 1.000000 —2.0 
a ee ey 
L2(s) 1.6) Lo(s) L_ (s) 1... ,6ὃ) 5 


NOTE: All coefficients become exact if each terminal 8 is replaced by 75, and each terminal 3 by 25. 
Τ See Sec. 3.4 for three-point coefficients. 


STIRLING INTERPOLATION 
Fs % fo + 8 wofy + C2(s) d7fy + C3(s) ud*fy + C4(s) δέ 


5 C2(s) C3(s) C4(s) 5 

0 0.00000 0.00000 0.00000 0 
0.1 0.00500 — 0.016507 — 0.00041 —0.1 
0.2 0.02000 — 0.03200} — 0.00160 —0.2 
0.3 0.04500 —0.04550¢ --Ο.00341 —0.3 
0.4 0.08000 —0.05600T — 0.00560 —0.4 
0.5 0.12500 —0.06250t -  —0.00781 —0.5 
0.6 0.18000 —0.06400t — 0.00960 —0.6 
0.7 0.24500 —0.05950F — 0.01041 —0.7 
0.8 0.32000 — 0.04800} — 0.00960 —0.8 
0.9 0.40500 — 0.02850t — 0.00641 —0.9 
1.0 0.50000 0.00000 0.00000 —1.0 


} Change sign when reading s from right-hand column. 
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BESSEL INTERPOLATION 
fo & why + (s— ἢ δῇ, + Cas) w0*fij2 + Ca(s) Of). + Cals) n0*fi;2 + Cs(s) δ 2 


S 


0 
0.1 
0.2 
0.3 
0.4 
0.5 


+ Change sign when reading s from right-hand column. 


C2(s) C;(s) C4(s) C;(s) 
0.00000 0.00000 0.00000 0.00000 
— 0.04500 0.006007 0.00784 — 0.00063} 
— 0.08000 0.008007 0.01440 — 0.000867 
— 0.10500 0.00700} 0.01934 — 0.000777 
— 0.12000 0.004007 0.02240 — 0.000457 
— 0.12500 0.00000 0.02344 0.00000 


EVERETT INTERPOLATION 


f, % (lL — s)fo + 036) δῦ, + Cals) δέ 
+ sh + C01 — 4) δῇ + C401 — 5) δίῃ, 


0 
0.1 
0.2 
0.3 
0.4 
0.5 
0.6 
0.7 
0.8 
0.9 
1.0 


STEFFENSEN INTERPOLATION tT 
fe © fo + Crs) Ofij2 + Css) δϑ 2 


C2(s) 


0.00000 
— 0.02850 
— 0.04800 
— 0.05950 
— 0.06400 
— 0.06250 
— 0.05600 
— 0.04550 
— 0.03200 
— 0.01650 

0.00000 


δ 


10 
0.9 
0.8 
0.7 
0.6 
0.5 


(4,6) 


0.00000 
0.00455 
0.00806 
0.01044 
0.01165 
0.01172 
0.01075 
0.00890 
0.00634 
0.00329 
0.00000 


— Cy(—s) Of-1j2 — C3(—5) 87 f-1)2 


0.1 
0.2 
0.3 
0.4 
0.5 


+ See Prob. 19. 


Ci(s) 


— 0.12500 
— 0.12000 
— 0.10500 
— 0.08000 
— 0.04500 


0.00000 


0.05500 
0.12000 
0.19500 
0.28000 
0.37500 


C3(s) 


0.02344 
0.02240 
0.01934 
0.01440 
0.00784 


0.00000 


— 0.00866 
— 0.01760 
— 0.02616 
— 0.03360 
— 0.03906 
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NEWTON INTERPOLATION 


fs © fo + s Afy + 076) A*fo + 036) ΔΕ + Cals) A*fo + 6,09) A5fo 
fu-s © fu — 5 Vin + C2(s) V7fy — 036) V3fy + Cals) V4fy — Cs(s) V5fy 


(5 positive for interpolation) 


5 ζ,(5) C3(s) C4(s) C.(s) 

—1.0 1.00000 — 1.00000 1.00000 — 1.00000 
—0.9 0.85500 — 0.82650 0.80584 — 0.78972 
—0.8 0.72000 — 0.67200 0.63840 — 0.61286 
—0.7 0.59500 — 0.53550 0.49534 — 0.46562 
—0.6 0.48000 — 0.41600 0.37440 — 0.34445 
—0.5 0.37500 — 0.31250 0.27344 — 0.24609 
—0.4 0.28000 — 0.22400 0.19040 — 0.16755 
—0.3 0.19500 — 0.14950 0.12334 — 0.10607 
—0.2 0.12000 — 0.08800 0.07040 — 0.05914 
—-0.1 0.05500 — 0.03850 0.02984 — 0.02447 
0 0.00000 0.00000 0.00000 0.00000 

0.1 — 0.04500 0.02850 — 0.02066 0.01612 
0.2 — 0.08000 0.04800 — 0.03360 0.02554 
0.3 — 0.10500 0.05950 — 0.04016 0.02972 
0.4 — 0.12000 0.06400 — 0.04160 0.02995 
0.5 — 0.12500 0.06250 — 0.03906 0.02734 
0.6 — 0.12000 0.05600 — 0.03360 0.02285 
0.7 — 0.10500 0.04550 — 0.02616 0.01727 
0.8 — 0.08000 0.03200 — 0.01760 0.01126 
0.9 — 0.04500 0.01650 — 0.00866 0.00537 
1.0 0.00000 0.00000 0.00000 0.00000 


4.13 Supplementary References 


Standard references include Steffensen [1950], Milne-Thomson [1951], 


Norlund [1954], and Jordan [1965]. 

H. T. Davis [1963] includes tables of interpolation coefficients for the 
formulas of Newton, Stirling, Bessel, and Everett, together with corresponding 
coefficients for numerical differentiation. Other tabulations are listed in the 
index by Fletcher et al. [1962]. Davis also lists formulas which provide ap- 
proximations to the results of inverting truncated Newton and Everett formulas 
for the purpose of inverse interpolation [see also Prob. 30 (this chapter) and 
Sec. 2.9]. Salzer [1943, 1944] gives tables of the relevant coefficient functions. 

Miller [1950] gives a valuable discussion of the use of difference tables in 
the detection of errors. See Hamming [1971] for a probabilistic treatment of 
the effects of inherent roundoff errors on difference columns. Comrie [1956] 
includes additional throwback techniques. 
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The divergence of the interpolation sequence for f(x) = 1/(x? + 1) in 
[---5, 5] investigated by Runge [1901] is also considered by Steffensen [1950]. 
The example of f(x) = |x| in [—1, 1] is studied by Bernstein [19125]. See 
also Cheney [1966]. Conditions ensuring convergence of interpolation sequences 
(and of their integrals) in finite intervals (see Prob. 42) are established by 
Krylov | 1962]. General results relative to interpolation series are obtained by 
No6rlund [1926, 1954]. 

Interpolation in two-way tables [see Prob. 24 (this chapter) and Chap. 5, 
Probs. 4 and 5] is treated by Steffensen [1950], Willers [1950], and, in more 
detail, Pearson [19205]. For more recent contributions, see Southard [1956], 
Thacher [1960], and Thacher and Milne [1960]. 


PROBLEMS 


Section 4.2 
I Show that 
Nf = Where = Serr? 
Whe = Nar = oh-r/2 
Of; -Ξ- A’fe—r/2 = VWhi+r/2 
A(f,-1 AGx-1) = Vif. AG) = A(f-1 V9.) = Vk VGx+1) 
AVA, = VA, = δὲς 


2 Show that 
AK Gn) = ἐκ AGk + Gers Afi ACh) = Se + fers) Ah 
a(&) = aS—== a (7) = - Af, 
Ik GiGk+1 ti Sits 
3 Show that 
[1 (—1)r! Al 
Ch SS π΄ ΄΄ὃὋἝὋὃ“Ἕἕ’'' 
x(x + h)---(x + rh) 
A cos (@x + «) = esi Oo" Gos ox+at ll Ἔ τ 
2 2 2 
. ΟἾΔ roh γπ 
A’ cos (wx + «) = (2 sin *) cos (-- +at+ i + τ 


Section 4.3 


4 Calculate approximate values of f(x) = sinx for x = 0.50(0.02)0.70 and for 
1.50(0.02)1.70, by applying the appropriate newtonian formula to the following 


rounded data: 
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x 0.5 0.7 0.9 1.1 1.3 1.5 1.7 
10οὐ 0.47943 0.64422 0.78333 0.89121 0.96356 0.99749 0.99166 
5 Obtain the formulas 
hf, = Δί + 42s — 1) δὲ + 3s? — ὅς + 2) δῦ 
+ 75(2s? — 9s? + lls — 3) A*f 
+ zho(5s* — 40s° + 105s? — 100s + 24) ΔΗ͂ +--- 
and 
1 Xo +sh 3 ; 
Al f(x) dx = sfy + 487 fy + sys?(2s — 3) A% 
Χο 
+ zigs?(s -- 2)2 A3fy + ahqs7(6s? -- 4552 + 110s -- 90) Δ 
+ zezqs7(2s* -- 2453 + 10552 -- 200s + 144) Δ" + --- 
and also obtain corresponding formulas for Afy4, and for h~* {2% _., f(x) dxin 
terms of backward differences. . 
6 Use the data of Prob. 4 and the results of Prob. 5 to obtain approximate values of 


f'(0.6), f'1.6), f°(0.6), f’(1.6), and of J9:3 f(x) dx, [1:6 f(x) dx. 


Section 4.4 


7 


Show that Gauss’ forward and backward formulas (without error terms) are 
truncations of the formal series 


: 5 —[/str—1\ 5, en a eee 
cot (sir Σ (7 Yor ys) orne 


and 


δ — S+ r\ o> S+ Fr or+1 
ts ~ fo + (᾿ Of_1);2 + >, ( ae ) δ + ( 5 ἢ ὃ Δι] 


respectively. 

Calculate approximate values of f(1.0) from the data of Prob. 4, first by use of 
Gauss’ forward formula launched from x = 0.9, and second by use of the back- 
ward formula launched from x = 1.1. 

By specializing Sheppard’s rules for the formation of interpolation formulas to 
the case when the relevant abscissas are at a uniform spacing h, show that the 
coefficient of the kth difference encountered in a continuous difference path can be 
obtained by dividing by k! the product of k factors, each of which represents the 
distance between the abscissa of the interpolant and one of the abscissas lying in the 
region of determination of the preceding difference in the path, in units of the spacing, 
if the result of truncating the interpolation formula with the kth difference is to 
yield exact results at all points involved in its formation. Also, illustrate the use 
of this rule by writing down the forward and backward formulas of Newton and 
Gauss. 
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10 Show that the result of truncating the Gauss forward formula with the fourth 
difference can be written in the form 


ΠΑΡ. {au 4 ᾿-- bec pe I (ὅλ ρος τ = fo) || 


where the evaluation of the formula is conveniently effected from right to left, and 
write the backward formula, as well as the two Newton formulas, in similar forms. 


Section 4.5 | 


11 Show that Stirling’s formula is a truncation of the formal series 
fe @) 
5 5 fstr-l1)\ ωγι δ.ΈΨ arti 
~ fo + 5fy + > = 5 "fy + δῆ 
fs ~ fo (ἢ μδίο VA Pa Pek fo (>, + ἢ " fo 


12 Use Stirling’s formula to calculate approximate values of f(x) for the points 
x = 1,00(0.02)1.20 from the data of Prob. 4. 
13 Obtain the formulas 


hf = μὲς + «δὲ, + 43s? — 1) μδδο + vx5(2s7 — 1) δῖ 
+ qig(Ss* — 1552 + 4) ud fo + shos(3s* — 1052 + 4) δὲ τ -“- 


and 

1 as 3 92 3/4..2 4 

1} f(x) dx = 2sfy + 45° δ + τέσσ Gs — 5) d*fo 
Xo—hs 


+ qakeps3(3s¢ — 21s? + 28) 5% + °°: 


and use them to calculate approximate values of f’(1.1), f’1.0), f’0.D), f “(1.0), 
and of [1:2 f(x) dx, [6:3 f(x) dx from the data of Prob. 4. 


Section 4.6 
14. Show that Bessel’s formula is a truncation of the formal series 


str-1 
Zr 


s—4 fst+tr—l)\ cores 
+ On f, 
2r+1 ( 2r ᾿ 
15 Use Bessel’s formula to calculate approximate values of 7} (x) for the points 


x = 0.90(0.02)1.10 from the data of Prob. 4. 
16 Obtain the formulas 


hfs aj2) = Par + t μδ,,1 + ἡτ(2ε7 — 1) Of, )2 + agt(4t? — 5) uo" fy )2 
+ πω είδοι — 1202? + 9) δ'ῆ,,2 + τ" 


fo ~ μή, + ὦ — 4) ftp + > { ΤΩΣ 
r=1 
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and 


1 X1/2+th , ; > 
I | fcc) ἀκ = 2ε pfyyy + dyt(At? — 3) μδῇ, 


h X1/2—th 
+ galggt(48t* — 2002? + 135) wd*fy). + --- 


where x12 = (Xo + x,)/2, and use them to calculate approximate values of f'A.1), 
f'°1.0), f°.1), £10), and of [5:5 f(x) dx, [3:3 f(x) dx from the data of Prob. 4. 
Section 4.7 


17 Use Everett’s first formula to obtain approximate values of F(x) = sin x for 
x = 1.00(0.02)1.20 from the following data: 


x f@) δ᾽ δι 
0.9 0.78333 --3123 125 
1.1 0.89121 — 3553 141 
1.3 0.96356 — 3842 155 


18 By integrating Everett’s first formula, obtain the formula 


1% 
: [Ὁ ἃ = Wise — He wan + He ula -- τὐλὴσ Wa + 
Xo 


and verify that it follows also from the result of Prob. 16. 
19 Derive Steffensen’s formula (Everett’s second formula) in the form 
+ 1)s(s — 1) 
4! 
2 (s+ m+ 1\(s + m)---(s -- Mm) som+ig 
(2m + 2)! 
— 1 +1 — 1)\(s — 2 
ΗΝ Ss(s ΝΣ 3 (s )s(s Ms ) 53.115 er 
2! 4! 
_ + mis + m— 1)... -- κι -- 1) 
(2m + 2)! 


fe = fo + SHO hyp + GAME Phys toe 


out eee ae E, 


where 


-- f2m+3 s(s* — 1*):--(s* -- m+ 1%) 2m+3) 
es Cae Νὰ, 


Also show that it can be put in the form 


1 2 
Is % fo + ("5 ) in + ¢ ) hint 


—~s+1 —-s+2\ .3 
a, δ. — Of. — δ 8 
( ᾿ ) ar ( 4 ) F-1/2 


so that one set of coefficients is obtained from the other by replacing s by -- 5. 
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Use Steffensen’s formula to obtain approximate values of f(x) = sin x for 
x = 1.00(0.02)1.20 from the following data: 


x f(x) δ᾽ 
0.9 0.78333 
-.430 
1.1 0.89121 
— 289 
1.3 0.96356 
Section 4.8 
21 Sketch the function a(x) = x(x — 1) — 2)(« — 3 — 4) over the interval 


22 


—1 <x < 5. Noticing that the error associated with the approximation of f(x) 
by the result of retaining fourth differences in either of Newton’s or Gauss’ inter- 
polation formulas is of the form πα) ὦ [120 for some ¢, if f %(x) is continuous, 


- when the ordinates at x = 0, 1, 2, 3, and 4 are employed, and assuming that 


Newton’s formulas would be used principally in [0, 1] and [3, 4], whereas central- 
difference formulas would be used principally in [1, 3], account for the fact that 
the former are sometimes erroneously said to be “1685 accurate” than the latter. 
If interpolations were effected in the interval [1, 3] by both Newton and Gauss 
formulas, based on the five ordinates at x = 0, 1, 2, 3, and 4, and if no roundoffs 
were committed, how would the results actually compare in accuracy? What 
evidence is afforded by the graph with respect to the general relative dependability 
of interpolation and extrapolation? 

If f(x) = (1 + x), determine the Stirling and Bessel approximations over [0, 1] 

corresponding to a spacing A = 1, with x9 = 0 and x, = 1, and corresponding 

to the successive retention of differences through the first, second, third, and fourth. 

Then calculate the error in each of these eight approximations for x = 0.0(0.2)1.0, 

retaining only one decimal place, and plot the error curves in a common graph. 

Thus show that, in this example, the following facts are true over [0, 1]: 

(a) The Stirling mean first-difference approximation is better than that which also 
incorporates the second difference over most of the interval, and the mean 
third-difference approximation is better than that which also incorporates the 
fourth difference over the entire interval. 

(b) The Bessel mean second-difference approximation is better than that which 
also incorporates the third difference over half the interval. 

(c) The Stirling mean first-difference approximation is better than the three Bessel 
approximations which employ the first difference, the mean second difference, 
and/or the third difference, near x = 4 as well as near x = 0. 

(d) The Bessel fourth-difference approximation is much better than all the others 
and is followed successively by the Stirling third-difference and the Stirling 
fourth-difference approximations. 

(Compare the results of Prob. 23.) 


23 


24 
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Proceed as in Prob. 22 with f(x) = cos (xx/8), retaining five decimal places. 

Thus show that, in this example, the following facts are true over [0,17]: 

(a) The Stirling zeroth-difference approximation is better than the Bessel first- 
difference approximation over half the interval, whereas the Bessel mean 
zeroth-difference approximation is better than the Stirling mean first-difference 
approximation over the remainder of the interval. 

(ὁ) The Bessel mean second-difference approximation is better than that which 
also incorporates the third difference over half the interval. 

(c) The Stirling second-difference approximation is better than the Bessel third- 
difference approximation over most of the interval. 

(4) The Stirling fourth-difference approximation is better than all the others and 
is followed successively by the Bessel fourth-difference and the Stirling second- 
difference approximations. 

(Compare the results of Prob. 22.) 

The following data represent rounded four-place values of the elliptic-integral 


function E(x, y) = f ὃ V1 -- sin? x sin? t dt: 


54° 58° 62° 

0.8060 0.7988 0.7920 
0.8332 0.8251 0.8174 
0.8598 0.8508 0.8422 
0.8859 0.8759 0.8663 


Determine an approximation to E(52°, 51°) by (a) interpolating horizontally to 
obtain E(52°, y) for y = 50°(2°)56° and then interpolating these values vertically, 
(6) interpolating vertically then horizontally, and (c) interpolating directly along 
a diagonal. Also interpolate as accurately as possible for E(55.4°, 53.1°) by any 
method. 


Section 4.9 


25 


26 


Construct a difference table, corresponding to the results of rounding true values 
of f(x) = x° for x = 1.0(0.1)3.0 to two decimal places, and study the propagated 
effects of the roundoff errors. Also compare the mean absolute values of the third, 
fourth, and fifth differences with the ideal values, and show that a more regular 
difference array would result from an improper rounding of the values correspond- 
ing tox = 1.4 and x = 2.6 by one unit. 

Certain of the following 20 consecutive values, cofresponding to equally spaced 
arguments, are incorrect because of typical copying errors. Locate the errors and 
correct them. 


17278 48818 79779 112630 
23424 34440 86249 119398 
29585 60723 92752 126246 
35764 67041 99318 133180 
41964 73398 105937 140206 
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Section 4.10 


27 


28 


29 
30 


Show that the additional error R(s) introduced into Bessel’s formula, by replacing 
nsf, ,1 by u(S*f,)2 — k 6*f;,/2) and neglecting 4d*f,;2 otherwise, can be expressed 
in the form 


ΚΟ) = sgl(s? — 5)? + (12k — 2.62 — 5)] Hd*file 


and that the extreme values of the coefficient of yd*/, j2 for 0 S s Ξ 1 occur 
when s = 4 and when s? — s = 1 — 6k and are given by (3 — 16k)/128 and 
—(1 — 6k)*/24, respectively. Show also that the requirement that the extreme 
values be of equal magnitude and opposite sign (so that the maximum additional 
error is minimized) gives k = (3 + “/ 2)/24 = 0.184, and that R(s) then varies 
between the limits +(3 — 2V2)ud*f,/2/384 = +0.00045p64f;, 2. 

Show that the additional error R(s) introduced into Everett’s first formula, by 
replacing 52fy and 52f, by δῶ — k d*fy and 6°f, — k δ ἢ, and neglecting fourth 
differences otherwise, is identical with that associated with Bessel’s formula when 
δ = 6*f,; hence deduce that if k is assigned the same value as for Bessel’s 
formula (Prob. 27), then the additional error cannot exceed 0.00045 times the 
larger of [δ 0] and |d*/,| if the two fourth differences have the same sign. Show 
also that if those differences are equal in magnitude and of opposite sign, then 
R(s) is given by 


a i ee s+1 S\] «4 
κω - +21 [s(°4 8) + se (*)] ae 


w(t? = 101 + Or + 80k(t> = t)] δῷ 


Ι! 
I+ 


(t = 2s — 1) 


and show that the maximum additional error for 0 S$ s ΞΞ 1 is smaller than 
0.00122 |5+/,| in magnitude. Thus deduce that |R(s)| < 0.00122M in the general 
case, where M is the larger of [δέ] and |6*/\|. 

Solve Prob. 17 by using the throwback technique. 

Show that Everett’s modified second-difference formula can be expressed in the 
form 


go lp apy ett =D (G2 2)0 p= δ 1}: 
5 eae ih fo) + PHP fo - 2) δὴ - + NFA 


for inverse interpolation between fo and f,, when f, is given. Use it iteratively, first 
replacing s by zero in the coefficients in the right-hand member to calculate an 
initial approximation to the required value of s, and then successively introducing 
each new approximation into the right-hand member to obtain the next one, to 
determine approximately the value of x for which f(x) = 0.9 with the data of 
Prob. 17. 
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Section 4.11 


3] 


32 


33 


34 


35 


Show that the error corresponding to the truncation of the series in (4.11.5) with 
the nth term is of the form 
n+1 

E(x) = aa x(x — h)+++(x — nh)e% 
for some ¢,,, and deduce that, if x < nh, the errors E,(x) and E,,41(x) are of oppo- 
site sign, so that the error is smaller than the first term omitted and of the same 
sign (see Prob. 6 of Chap. 1). Under the assumption that e” > 2, so that (4.11.5) 
diverges, show also that the term corresponding to the (k + 1)th difference is 
smaller than the preceding one so long as k does not exceed kg, where ko is the 
integral part of [(e“ — 1)x + h]/[h(e% — 2)]. 
Iilustrate the results of Prob. 31 by calculating successive approximations to e* 
from successive partial sums of the Newton interpolation series (4.11.5) witha = 1 
and ἢ = 1, when x = 0.5. In particular, show that the best approximation is 
afforded by retention of only two differences, that a consideration of the first 
neglected term gives the result 1.49008 « 49." < 1.80716, and that the mean of 
these limits differs from the true value by one unit in the fourth decimal place. 
If f(x) = e™, show that 


2r 2r 
5°7f(0) = (2 sinh τὴ μδῆτ 1 (0) = (2 sinh 7) sinh ah 
and deduce that the formal Stirling series centered at x = 0 is of the form 


2 2 
Ξ (2 sinh =) 
2 


2 2 2 
χοῦ - 6) (2 sinh 2) sinh σὴ 
3113 2 


x 
2th 


e~~ 1 + Zsinh ah + 


274-2 _ p2 4 
ea aed | >) er 


2 sinh — 
4! ht 2 
Calculate six successive approximations to the value of e* when x = 0.5 by use 
of Stirling’s formula centered at x = 0, with A = 1, and investigate the trend of 
the successive deviations from the true value. (Notice that the infinite Stirling series 
itself is convergent in this case.) 
By successively equating the even and odd parts of the two members of the 
expansion of Prob. 33 and taking A = 1, deduce the formal expansions 


cosh ax ~ 1 + x" p24 χοῦ = 15) pa Ῥ xi(x? -- VQ? -- 2*) pe 4 _ 
| 2! 4! 6! 
and 
sinh ax 1. sinh ax 
sinh a ~ B cosh (a/2) 
wx ge 16α᾽ - 15) 2 4 (x? -- 1?)(x? — 2?) Bt gee. 


3! 5! 
where β = 2 sinh (a/2). Show also that these series converge when [β] < 2. 


170 INTRODUCTION TO NUMERICAL ANALYSIS 


36 Show that the formal Bessel-series representation of f(x) = e*, centered at 
x = h/2, is of the form 


gant oa gE hg ge cami B? cosh a 
2 2h 2! h? 2 


x(x -- )(ῶχ -- ) .53 , ... 
2.318 Dae 


where fB = 2 sinh (ah/2). Also, by taking ἢ = 1, replacing x by x + 4, and 
successively equating the even and odd parts of the result, deduce the expansions 


coshax 9, | x°- Σ g2 (x? -- 2) -- ἢ ftps. 
cosh (a/2) 2! 4! 
and 
sinh ax = B amex zie 
2 sinh (a/2) 
= x(x? — 4) ps3 , XX? — ἢ — ἢ ps 
xB + τ Br + = β τ' 


where β = 2 sinh (a/2), and show that these series converge when [βὶ < 2. 
37 Let f(x) be a function such that f(kh) = O(K = +1, £2,.. .) and f(0) = 1. 
(a) Obtain the formal Newton series 


x x(x — h) 
LONG ag ΣΝ 
ΕἾ ς--556.Ξ h)-+-(x -- i=, Dh... 


ni h" 
and the corresponding Stirling series 
5 OE) 5, 
(1! h)? (2! h?)? 
χα — h?)-+- (x? — n— 1285) ἡ πις 
(n! h")? 


f(x) ~ 1 
+ (-1)" 


(b) Show that the Newton series converges for x = 0 and diverges when x < 0, 
and that the Stirling series converges for all x.} 


+ Use may be made of a test associated with Raabe and Gauss, which states that if all 
terms of a real series are of the same sign after a certain point, and if the ratio of the 
(n + 1)th term to the mth term can be expressed in the form 


0 


ς 
1--+4 
nn? 


where c is independent of n and θ᾽ is bounded as n — οὐ, then the series converges 
if c > 1 and diverges otherwise unless it terminates. (See Knopp [1956].) 


38 


39 


40 
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(c) Verify that the Newton series can be obtained as the limit as ὦ > 1— of the 
binomial expansion of (1 — ¢)*/" in powers of ¢, and deduce that the series 
converges to a function f(x) such that f(x) = 0 when x > 0 and f(0) = 
[The Stirling series, on the other hand, can be shown to converge to the 
function f(x) = (sin 2x/h)/(xx/h). 

(Note, for example, that accordingly the effect on the value of the mth interpolation 

polynomial at x = h/2, due to an isolated error ¢ in the ordinate at x = 0, would 

tend toward 2«/x = 0.64e as n increases in the Stirling sequence, but would tend 
to zero in the Newton sequence, when both sequences are “launched” from 


x = 0.) 
If f(z) is analytic in a simply-connected region & of the complex plane which 
includes the segment of the real axis spanned by the points xo, x,,..., X,, and 


x, and if C is a simple closed curve in @ which surrounds this segment, use the 
fact that 


f(x) = — § (24 
2πὶ Jc Z— x 
to deduce that 
1 f(z) dz 
f [X05 X15++ +) Χο» X] = 2πὶ eye 


where 
T(Z) = (Z — χορ) — χα): (Z — χρ) 


(Review Prob. 17 of Chap. 2.) 
Apply the calculus of residues to the result of Prob. 38 to derive Lagrange’s 
polynomial interpolation identity in the familiar form 


f(x) - ie Tm) __ f(x,) = E,(x) 


ΝΣ χρ)πι Xx) 


but with the error term 
E, (x) ἊΣ πι(χ) Χο: St 0. Xn x] 


Pe M(x)f(z) dz 
‘ = 


c πμ(2)(2 -- x) 


expressed in the alternative form 


Deduce also that | 
πρ(χ) 
πι( 2) 
where |f(z)| S M on C, L is the length of C, d is the shortest distance from the 
real point x to a point on C, and where, for a given real value of x, |z,(x)/z,(z)| ᾿ 
is to be maximized for all z on C. 

In Prob. 39, suppose that the points x9, x1,..., X, are equally spaced and span 
a fixed interval [a, δ] so that, with h, = (ὁ — a)/n, there follows x, = xo + kins 
with x9 = aand x, = δ. Suppose also that x is in [a, b]. Show that 


ML 
—— max 


|E,(x)| S = 
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4] 


42 


43 


n 
h, log |z,(x)| = ees log |x — x, 


k=0 


and hence that 


a 


v(x) = if los [5 Ξ 5 = =) dt 
Ix — 7 


deduce from the result of Prob. 40 that 
πρί(2)}} 


πι(χ) 


b 
lim [ἢ log |z,(x)|] = Ϊ log |x -- t| dt 


With the abbreviation 


e7 Mle) —0@)] (n a 00) 


and hence, with the notation of Prob. 39, 


|E(x)| S K, 


where 


~ 
n 


ML e nlx) — 02)] 


n— © 
2nd ( ) 


so that the interpolation error E,(x) tends to zero as n ~ © for all x in [a, δ] 
if C can be so chosen that v(z) < v(x) for all such x and for all z on C. 
With the notation of Prob. 41, verify that 


Re | : [ (4) 4] 
ῥ-- α z—t 


Re {1 + log(b -- a) -- τ 


v(z) 


[(z — a) log (z — a) 
a 


+ (6 — z)log(b — a1 


1 + log (6 — a) -- 


5 : : Ια -- a)log V(x -- a)? + γ 


+ (b — x) log V(b — x)? + y? 


—y (tan _* + tan7! —)| 
b-—x x—ajy 
Also show that the locus v(z) = 1 in the complex plane passes through the ends 
of the real interval [a, δ] and that at the midpoint of that interval v(z) = 1 + 
log 2. 
It can be shown that the locus v(z) = 1 in Prob. 42 is an oval curve Γ (somewhat 
resembling an ellipse) with longitudinal vertices at the ends of the real interval 
[a, b], center at its midpoint, and lateral vertices at a distance of about 
0.26(6 — a) from the center. Also it is true that v(z) > 1 everywhere inside Τ᾿ 
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and v(z) < 1 everywhere outside T. Assuming these facts, deduce from the 
results of Probs. 38 to 42 that if f(z) is analytic in a region 99 including the curve T, 
then the polynomial interpolation sequence generated by fitting f(x) atn + 1 equally 
spaced points in [a, b] converges to f(x) everywhere in [a, δ] and also the Newton- 
Cotes sequence of approximations converges to the integral f : I(x) dx asn— ow, 
(Notice also that it is sufficient to require that f(z) be analytic inside and on T 
since then, by the definition of analyticity, f(z) also would be analytic in a region 
& including Γ. To deduce convergence of the integration sequence, use must be 
made of the uniformity of the convergence of the interpolation sequence. For a 
detailed treatment, see Krylov [1962].) 


5 


OPERATIONS WITH FINITE DIFFERENCES 


5.1 Introduction 


The purpose of this chapter is twofold: first, to indicate the power and simplicity 
of operational methods in deriving a variety of formulas which are useful in 
various aspects of numerical analysis, and, second, to display certain such 
formulas for convenient reference and for use in following chapters. 

The operational methods which are illustrated supply only the formulas 
themselves and do not furnish the relevant error terms, which therefore must be 
obtained independently. Many of the formulas also could be obtained by 
differentiating or integrating appropriate interpolation formulas, although it 
often would be somewhat more difficult to obtain the rule of formation of the 
general term in an expansion. However, in such cases, it is clearly possible to 
deduce the desired error term by differentiating or integrating the known 
error term relevant to the parent formula. 

In addition to formulas for numerical differentiation and integration, 
generally expressed in terms of forward, backward, or central differences, there 
are included certain formulas which are useful in subtabulation (Sec. 5.7) 
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and approximate summation of series (Secs. 5.8, 5.9). The concluding sections 
(Secs. 5.10 and 5.11) deal with the problem of determining an expression for the 
error term relevant to a formula, when the coefficients in the formula are known. 


52 Difference Operators 


For many purposes, it is convenient to think of the symbols, A, V, and 6, 
defined in the preceding chapter, as operators, which transform a given function 
f(x) into related functions, according to the laws 


Af(x) = f(x + h)-f%) V(x) =f) -- f(x -- A) 


of (x) -ρ Ἔ Ἵ - r(: - Ἵ (5.2.1) 


Also, in addition to the averaging operator μ, such that 


1 h h 
uf(x) = ἢ (= + 5) + r(: - 2) (5.2.2) 


we define the shifting operator E such that 
Ef(x) = f(x + h) (5.2.3) 


and differential and integral operators D and J with the properties 


D/(x) = f(x) (5.2.4) 


and 
Jf(x) = | f(t) dt (5.2.5) 


In all these operators except D, the spacing h is implied. When a more explicit 
notation is needed, the spacing may be indicated as a subscript, so that, for 
example, we could write 6,, f(x) for f(x + h) — f(x — ἢ). In addition to the 
operator J, which is associated with A, one can define integral operators which 
correspond similarly to 6 and to V; this will not be done here. 

Positive integral powers of the operators are defined by iteration. Also, 
we define the zeroth power of any operator as the identity operator 1, which 
leaves any function unchanged. For the operator E, the power ΕΠ is defined 
for any ἃ 80 that 

E*f(x) = f(x + ah) (5.2.6) 
assuming the existence of f(x + ah). 

We say that two operators, say, L, and 1,5, are equal if L, f(x) = L, f(x) 
for any function f(x) for which the operations are defined. It is easily verified 
that the seven operators defined here possess the commutative, distributive, 
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and associative properties shared by real numbers, so that if L,, L,, and L, 
are any of these operators, there follows 


L, + L, -Ξ L, + L, LL, ἘΞ L,L, 
L,(L, + L3) = LiL, + £,L, 
(L,L,)L; = L,(L,L3) (L, + L,) + Lz; = L, + (Ll, + Ls) (5.2.7) 


The exponential law L"L” = L”*” is also readily established for each of these 
operators. 

In particular, to show that D and J are commutative, we make the 
calculations 


xt+h 


DI f(x) = 4 | 


x 


OY -- Κα - ὸ - Ore Γ we dt = IDf(x) 


x 


and so deduce also that 
DJ=JID=A (528) 
We may define L~* as an operator such that 


LL“! =1 (529) 


so that, if L~1g =f, then LL~‘g = Lf or g = Lf, and refer to L“* as an 
inverse of L. It is important, however, to notice that the inverse operator 
L~! may not be uniquely defined. For if w(x) is any function which is an- 
nihilated by L, so that La(x) = 0, and if one interpretation of L™ *9(x) is f(x), 
so that Lf(x) = g(x), then another one is f(x) + a(x), since also 
LL f(x) + o(x)] = g(x). That is, we may write L" ‘Lf = f + o, where ὦ is 
any function annihilated by L. Conversely, if L>*Lf = f + ὦ, then it must 
follow that LL~1Lf = Lf + Lo or Lf = Lf + La, so that Low must vanish. 
If no function is annihilated by an operator L, there follows 


LL] LL - 1 


and L~! is then said to be a proper inverse. Thus, whereas no function is 
annihilated by E, the operators A, V, and 6 annihilate any function of period 
h, J annihilates the derivative of any such function, and D annihilates any 
constant. Further, » annihilates any so-called odd-harmonic function of period 
2h, that is, any function f(x) for which f(x + h) = —f(x). Hence, care must 
be taken with respect to the order of operations involving the inverses of those 
operators. 
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In the case of the operator D, it is seen that D~ 1 corresponds to the forma- 
tion of an indefinite integral, or antiderivative, and the situation described 
corresponds to the fact that, whereas the derivative of that integral is the original 
function, the integral of the derivative involves an arbitrary additive constant. 
On the other hand, it should be noticed that AD~1f(x) is uniquely determined 


since A annihilates the arbitrary constant. In fact, if we write 


D'(x) = FR +C 
we see that 
ΔῊ *f(x) = F(x + ἢ) — F(x) = 1.) 
so that we may write also 
AD'=J 
This result follows also by using (5.2.8) to deduce that 


AD"! = JDD'! = Α 


(5.2.10) 


In the present chapter, we will be concerned principally with applying 
operators to polynomials. In this connection, we may notice that each of the 
operators A, V, 6, and D reduces the degree of any polynomial, and that the same 
statement is true of any positive integral powers of these operators. We will 
refer to such operators as reductive operators. (They are sometimes called delta 
or theta operators.) It should be noticed that E, μ, and J are not reductive. 


From the definitions given, we may obtain the relations 
A = E -- 1 V — I = E7! 
ὃ -Ξ Ἐ11} on EK” 1/2 μ - 1({112 Ἔ Ἐπ 
whereas (5.2.8) leads also to the relation 
DJ = JD=E- 1 


so that these operators are simply expressed in terms of E. 
Further, if r is any nonnegative integer, there follows 


AY = E'yv' ΞΣ E25" 


= (E -- 17) =E ~ 1 κν-: εὖ — [) p-2 — 
1! 2) 


+ Gis +(-1) 


(5.2.11) 


(5.2.12) 
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and hence, by applying these equal operators to y(x,) = y;, we obtain the 
useful formulas 


᾿ r r 
AY, = Vet+r — ( γκ4ᾳ:-ι τ (Ω Vet+r-2 π 


ταις! 4 ee 


" r r 
Vy, = ΕΣ + (5) 2 arenes 


+ (-1y"? ἢ Ve-rga t+ (-1)%-» 


r 


O'Ve = Vetrj2 — 4 Verrj2—-1 Το" 


+ (-1)7* ἢ γκγζει t+ (—L)Ve-rj2 (5.2.13) 


These relations permit the calculation of an arbitrary difference as a linear 
combination of ordinates, without the formation of a difference table or the 
calculation of differences of lower order, the coefficients of successive ordinates 
being merely binomial coefficients prefixed by alternating signs. 

From the relations of (5.2.11), we may properly deduce the relations 


E-A=1 Ed — V) = 1 
(E/? — 16)? — 167 = 1 ΕἸΣ. 346 —-p=0 
after which the formal symbolism of elementary algebra suggests the forms 


1+A fees] 


1—Vv 
(1 + 462)/2 (5.2.14) 


E 


EY? = (1 4 362)2 446 μ 


While the first form requires no explanation, the term 1/(1 — V) can be inter- 
preted at this stage only as representing the inverse of the operator 1 — V, 
that is, as an alternative notation for the operator (1 — V)~? such that 


( -- αᾳ --  Ξ 1 (5.2.15) 


whereas the derivation of the third form shows that (1 + 467)'/? is to represent 
an operator such that its iterate is the operator 1 + 467 


[a + 46427 = 1446? (5.2.16) 
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If we now suppose that the function upon which the operations are to be 
effected is a polynomial p(x), of degree n, we may obtain a more useful inter- 
pretation of these operators. For, if ¢ is a variable, we have the identity 


GQ—rd+r1+¢7 +e 4+ ἢ =1 -α 7"! 


for any nonnegative integer n. Clearly, it is proper to replace ὦ by V (or by the 
symbol representing any other distributive operator), to give 


d-Wd+V+wW+e-4+V9=1-V"t! (5.2.17) 


Since the operator γῆ} will annihilate p(x), the operator in (5.2.17) is equivalent 
to the unit operator for any such p(x). Since the inverse of 1 — V is uniquely 
defined by (5.2.15), it follows that we may write 


Q—-V)'=14+V+t+W4--4+V" 


when only polynomials of degree n or less are to be affected by the operator. 
More generally, we are justified in writing 


E=(1-—V)*=14+V4V?4---= > VF (5.2.18) 
k=0 


when the class of all polynomials is included, since the finite number of required 
terms, for which the exponent of V does not exceed the degree of the polynomial, 
is present, and the remaining terms each annihilate that polynomial. 

It is useful to notice that the coefficients in (5.2.18) are those which would 
be present if the superscript —1, denoting the inverse, were interpreted as a 
power and if formal use then were made of the binomial expansion of the 
reciprocal of 1 — V. 

In a similar way, it can be seen that if we retain only the terms which 
involve powers of 6 which do not exceed n in the formal expansion 


(1 + 467)? = 1 + 45? — -1.δΆ 4+. 


the resultant polynomial in 6 possesses the property that its square differs from 
1 + 46° by the product of 6"*! and a polynomial in 6, and hence is equivalent 
to 1 + 46? for any nth-degree p(x). Clearly the negative of the indicated 
expansion also has this property. However, the result of applying the expanded 
form of the third relation in (5.2.14) to any arbitrarily chosen function (say a 
constant) shows that the former alternative is the proper one, so that we are 
justified in writing 


EY? = (1 + 46)? 446514464482? —--- (5.2.19) 


when we deal only with polynomials. 
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It then follows that we may write 
E’ = (1 + A’ = (1 — V)* = [( + 36%)” + 46)]* (65.2.20) 


when s is any integer, where each right-hand member may be expanded in a 
series of powers of the relevant reductive operator when we are dealing only 
with polynomials. The extension to the more general case when s is any rational 
number offers no difficulties. It is also possible to give a rigorous direct justifica- 
tion of the use of these expansions when s is irrational although the required 
arguments are somewhat more subtle. 

The first two equivalences in (5.2.20) are, in fact, seen to be symbolic 
representations of the relations 


Psy + sh) = > (;) Atala) = >, (τ (σοῦ 6220 


to which the previously established Newton forward- and backward-difference 
formulas (4.3.11) and (4.3.13) reduce when f(x) 15 replaced by a polynomial 
p(x), since only a finite number of terms then do not vanish and since the 
remainder term also vanishes. This fact can be considered as constituting an 
indirect proof of the validity of the first two relations of (5.2.20) for a poly- 
nomial, when s is unrestricted. 

As was discussed in Sec. 4.11, the series in (5.2.21) frequently do not 
converge when p(x) is replaced by a function f(x) other than a polynomial; 
they must be truncated, say, after n + 1 terms, and the appropriate error term 
(4.3.6) or (4.3.9) then must be added. However, the coefficients in the formula 
are not dependent upon the nature of f(x), and the present operational methods 
serve to determine those coefficients in a simple and systematic way. 

The equivalence of the extreme members of (5.2.20) can be expressed in a 
variety of forms, such as 


ES = [(1 + 462)? + 46]?* = [1 + 46? + 601 + 407)? 
= (1 + 46? + pd)’ = (1 + E*6)) = (1 - ἘΠ “δ᾽ (5.2.22) 


The operational formula obtained by expanding the first or second of these 
expressions would correspond to an interpolation employing the central 
differences 5?"*4f(x,) as well as the central differences of even order. Since 
the former differences generally are not available in tabular work, this formula 
would be of limited use. The Stirling formula could be obtained by expanding 
the third expression and afterward replacing μ΄ by (1 + 67/4)" and μὴ! 
by p(1 + 52/4)". The two gaussian formulas could be obtained from the 
remaining two expressions. Since the results have been obtained in the preceding 
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chapter (see also Prob. 5), these calculations are omitted here. However, the use 
of operational methods in deriving formulas for interpolation in two-way 
tables is illustrated in Probs. 3 to 6. The derivations of formulas for other 
purposes are indicated in the following sections. 

In the remainder of this chapter we shall proceed, in general, as though 
we were concerned only with polynomials, and often we shall emphasize this 
fact by writing p(x) in place of f(x). Formulas to be obtained then will have 
been established as identities for any polynomial p(x), in which case all relevant 
series of operations by reductive operators will terminate. The determination 
of the error term to be introduced when the formula is applied to a function of 
more general type, after a truncation of the series involved, is then to be con- 
sidered in each case as a separate problem. 


5.3 Differentiation Formulas 


In order to obtain formulas for numerical differentiation, by operational 
methods, it is necessary to relate D to other reductive operators. For this 
purpose, we notice that the familiar formula of the Taylor-series expansion 


2 n 
p(x + h) = p(x) + = Px) + = p's) fore yp ~ pO) fosee (5.3.1) 


(which certainly is valid for any polynomial) can be written in the operational 
form 
D WD |. Γ h"D" 
1! 2! n! 


+ “) p(x) 


Since the series in parentheses is the formal expansion of the function οἰ, we 
deduce the curious relationship 
E=e? (5.3.2 


which is to be interpreted as an abbreviation of the statement that the operators 
E and 1 + AD/1! +--+ + (AD)"/n! are equivalent when applied to any poly- 
nomial p(x) of degree n, for any n. 

Further, we may obtain the additional relations 


hD = log E = log (1 + A) = —log (1 — V) 
= 2 log [( + 467)'? + 46] = 2 sinh“ 5 (5.3.3) 


Here, for example, the symbolic relation AD = log (1 + A) asserts that the 
operators AD and P,(A) = A — A?/2 +--- + (—1)"*1A"n are equivalent 
for any nth-degree p(x). Its validity can be verified directly by noticing that 
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since A and AD/1! + --- + (AD)"/n! have been shown to be equivalent for any 
such p(x), we may replace A by this last operator in the polynomial P(A). 
The result will differ from AD by a polynomial of the form a,(AD)"** + +++ + 
a,(hD)?", which will annihilate p(x). 

In terms of forward differences, we thus deduce the formula 


, 1 1 
Po = 7 log (1 + A) Po = 7 (A — $A? + :Δ᾽ —-+"")po (6.3.4) 
By iteration, there follows also 


r 1 ir 
ps? = τ [log (1 + A)T’Po 


πᾷ — 5 + Δ --- τ) ΔΡο 


ΒΝ 1 5 At r(3r + 5) Art2 
h 24 
γι + 2)(r + 3) r+3 
La Ne ose 5.3.5 
| 48 Po ( ) 


The coefficients in this expansion are expressible in terms of the so-called 
Stirling numbers of the first kind, which may be denoted by S{?, and which are 
then defined by the relation 

flog (it Ὁ τ΄ SL? 
r} ἐξ k! 
when {{| is small, so that (5.3.5) can be written in the form 
1 [50 Ss” «(ἡ 
(r) _. r r+1 r+2 2 ἍΝ r 
=— + A + —*—__ &* +:::]A 5.3.7 

ὡ ae ει Ge) Po 090) 

In a completely similar way, the corresponding backward-difference 
formulas are obtained in the form 


t* (5.3.6) 


fie ~ log (1— V) py = - (V - 4V2 + AV? Ὁ *)py (5.3.8) 
and 
PP = + AV τ AV τ VN 


I 


hr 24 
r(r + 22 + 3) yrt3 4 ve i 


ah ΕΣ >We ae r(3r + 5) yrt2 


+ 
48 


(r) (r) (r) 
ΒΞ: δε τς δέει V+ __Sri2_ ν2 — +++ | VP (5.3.9) 
1 r+1 ( + Ὅς + 2) 
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From the last form of (5.3.3), there follows symbolically 
᾽ς ὃ 
ο = [ Ξ 5168} - 5.3.10 
Po Ε Ὶ Po ( ) 


and several types of central-difference expansions are possible. Since the right- 
hand member is an odd function of 6, its expansion in powers of 5 would involve 
odd central differences, which are not generally useful in tabular interpolation. 

To obtain a result equivalent to that of differentiating the Stirling formula, 
and evaluating the result at x9, we require an expansion involving mean odd 
central differences. Hence, by multiplying the right-hand member by p and 


dividing by its equivalent 1 + 52/4, we obtain the form} 
, _ 2p sinh™* 6/2 


Ρ -- 
"h/t + 8/4 

~H(s_ 153. 12 5s ..Ψ.Ψ 

τεῷ yet Po (5.3.11) 


This formula is useful for calculating the derivative at interior tabular points, 
whereas (5.3.4) and (5.3.8) would be required at end points of the tabulation. 
Intermediate values are then conveniently interpolated from these values. 

Higher derivatives of even order 2m are obtained by use of the operator 
D?’", where D is expressed as in (5.3.10), 


D = 2 dane? net 
h 


2 2. 32 2.22. «2 
= (5- "3+! “dele 7 cudladaae (5.3.12) 


22. 3! 24.5) 25. 


whereas higher derivatives of odd order 2m + 1 are obtained by multiplying 
the operator in (5.3.11) by the expansion of D?". Thus, for example, we may 
obtain the formula 


2 2 2.22. «2 2 
m= (6 - i 83.1.3 eal pa ot) 


h? 25. 3! 25.1.5] 2.37] 
1 ' 
= 13 (δ᾽ — y36* + 56° -- τἰ δὴ + **)Po (5.3.13) 


+ More properly, we should operate on the right-hand member by 1 = με": = 


μά + 67/4)~1/2 to obtain 
’ |? (1 Σγ inh~? “| 
= [- 5 eres sinh™" — 
Po h 4 > Po 


and then effect the desired expansion in powers of ὃ. But since the coefficients in 
that expansion are independent of the order of the factors, the fact that some ordering 
is justifiable permits us to ignore the noncommutativity of ~~! with other factors 
when we actually determine those coefficients. Similar comments are applicable to 
other such operational manipulations in what follows. 
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Other formulas obtainable in this way may be listed as follows: 


pin = = (δ = 15° + τὴν Τὰς -)Do (5.3.14) 
piv = “3 (5* ἘΝ 15° “te 58° — ss *)Po (5.3.15) 
py = = (5° = 157 + -**)Do (5.3.16) 


The error term to be introduced in each case, when p(x) is replaced by a 
function f(x) which is not a polynomial, so that the series must be terminated 
with, say, mth differences, can be determined by use of the results of Sec. 3.3 
if it is noticed that the result of this truncation corresponds to the differentiation 
of a polynomial of degree n which agrees with f(x) at Xo, X1,..., X, in the case 
of (5.3.5), at Xy, Xy-1,---, Xw—n in the case of (5.3.9), and at Xo, X+1,---, 
X+m in the case of the central-difference formulas, where m = n/2 if n is even, 
and m = (n + 1)/2 if n is odd. In the case of the forward- and backward- 
difference formulas, when the differentiation is effected at an end point of the 
range, formula (3.3.20) is valid. However, in the central-difference formulas, 
the more complicated formula (3.3.15) cannot be avoided. In practice, unless 
the analytical definition of f(x) is known and is of sufficiently simple form to 
permit the determination and estimation of corresponding analytical expressions 
for the higher derivatives involved in those error terms, one generally must 
estimate the error by considering the magnitude of the first term omitted, 
realizing that this estimate is not necessarily a dependable one. The importance 
of inherent errors in numerical differentiation has already been emphasized in 
Sec. 3.8. 

Formulas for differentiation at a point midway between two tabular 
points are obtained by writing D in the form (5.3.12) and operating On P;/2. 
In addition, in order to obtain ordinary odd central differences and mean even 
central differences at s = 4, we must multiply the expansion by the unit operator 
μιν 1 + 62/4 in calculating derivatives of even order, whereas in the preceding 
case this device was necessary when calculating derivatives of odd order. Thus 
we have, symbolically, 


2m 
2 ὃ 2 sinh ! δ 5.3.17 
Ρι,2 Ji + 8/4 τ 524 Ε 5 Ρι,2 ( ) 


and 


Oh. A ὃ 2m+1 
pyr 1) = Ε sinh™ 4 P1/2 (5.3.18) 
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In particular, when m = 0 in (5.3.17), we thus rederive (4.6.8) in the form 


Pia μί — 30° ae τῇδ = τῦσχὸν + τὸ εεδ πο .)Ρ12 (5.3.19) 


and obtain also derivative formulas which may be listed as follows: 


Pio = (5 Ἐν (5.3.20) 
ΡΊ 1: ae (δ᾽ — 3454 + P5255° — 33332508 +++ +)py. (6.3.21) 
Pia = το (8° — 45° + 184087 — =) yp (5.3.22) 
PS z (δ΄ — 445° + S¥55® — +++) pi) (5.3.23) 
Piya = τς (δ = 287 + =p (5.3.24) 


In certain applications, it is desirable to express differences at a point in 
terms of derivatives at that point. This is the inverse of the problem just con- 
sidered. Thus, in order to express forward differences in terms of derivatives, 
we again refer to (5.3.3) and obtain the relation 

D 222 r 
A’ = (Ὁ ἘΞ 1} = Gi + =. a πὸ ies ὴ (5.3.25) 


Thus there follows 


A'Do = | cy a2 5 Ὁ} 4. ΓΤ 1) (hD)’*? 


2 
it στ (AD) *3 + aa Po (5.3.26) 


The coefficients in this formula are expressible in terms of the so-called 
Stirling numbers of the second kind, which may be denoted by Sf, and are 


then defined by the relation 
᾿ t r a (r) 
(eo - ἡ Σ af,” t* (5.3.27) 


when [{| is small, so that (5.3.26) can be written in the form 


(r) (r) 
A’ po ἐς: οὔ" ἘΞ epee (hD) 
1 r+ 1 


das 2 eee r 
“Cape ἱ |e Po (5.3.28) 
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By comparing the relation 

(~VyY - ((Ῥ-- ἡ (6.3.29) 
with (5.3.25), we see that a corresponding formula for backward differences can 
be obtained by replacing A by —V and D by —D in (5.3.26) or (5.3.28). 


Similar formulas involving central differences are readily obtained from 
the relations 


ὃ = 2 sinh δ p = cosh Ε po = sinh "ἢ  (.3.30) 


Thus, for example, there follows 


μὸρο = [(AD) + 2(AD)® + τσ)" +--+ D0 (5.3.31) 
and 
δῆρο = [(AD)? + 2(AD)* + τισι) + -°+J20 (6.3.32) 


5.4 Newtonian Integration Formulas 


For the purpose of obtaining formulas for numerical integration, we may make 
use of (5.2.10) 


J = AD" (5.4.1) 


combined with one of the relations of (5.3.3). Thus, to obtain a formula involv- 
ing forward differences for the approximation of the integral 


[pea 


we may notice first that, when f(x) is a polynomial p(x), this integral can be 
expressed as 

E’ —1 

E-1 


Q+tE+E? 4+-:°-+E')IJp = JPo 


where it is assumed that the operators are destined to be expanded in series of 
powers of a reductive operator. Hence, by expressing E and D in terms of A, 
there follows, symbolically, 


Xotrh Ν (1 +4 Ay ee | A 
[ p(x) dx = ἢ jee | Pcs a Z| ρ (5.4.2) 


The expansion of the first operator is easily found to be 


(+A -1_< r ; 
ἘΠῚ  -- >, ( ᾿ ᾿ Ai (5.4.3) 
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whereas the expansion of the second factor may be written in the form 


A < j 
———_— = δ᾽ c,A (5.4.4) 
log (1 + Δ) γ30 
the first nine coefficients of which are 
ΝΡ = — 1 
Cf = 1 οι, =4 (= ~-iz 
ees | = 19 a= 3 
C3 = 52 C4 = —F706 Cs = 160 
a 863 en 275 — 33953 
C6 = —60480 C7 = 34192 Cg = —3628800 (5.4.5) 


Hence the operator involved in (5.4.2) can be expressed in the form 


00 00 ἜΝ οὐ k r ᾿ 
2 "Σὲ 1 Oe 4 ᾿ ae 2s > ae ( 3, )| ὰ 8.0) 


Thus, if we write 


αἱ" = 2, oe a " )- δὰ (1) + G1 (ΠῚ forte (5.4.7) 


where the series terminates when the subscript of c vanishes or when the argu- 
ments of the binomial coefficient become equal, the required formula becomes 


| age es (δ Σ a A) pe (5.4.8) 


In particular, in the case r = 1, there follows af!) = c, and (5.4.8) 
becomes 
Xoth 
p(x) dx 
= h(1 + 4A - TA’ + 3 zaA° = Fah" + Tech” + **)Po (5.4.9) 


In the case r = 2, there follows αἱ2) = 2c, + c,,, and we may obtain the 
formula 


XO 


Xo+2h 
p(x) dx 
ὺ = 2Μ(1 + ΔῈ $A? + 0Δ᾽ — τἴσλό 4 τὴσΔδ᾽ 4 “ἢν. (5.410) 
Further, in the case r = —1, there follows 
a) = -τ-ο + Gy το Gg Ἔ 1: + (- τ το 
and (5.4.8) becomes 
[ ; p(x) dx 


= αὶ — 4A + A? — 3A3 4 254A4 — .55.λ' 4. **)Po (5.4.11) 
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This formula amounts to the result of using the Newton formula to extrapolate 
f(x) = p(x) backward over the interval [x) — ἢ, x9] and integrating the result 
over that interval. 

Similar formulas are easily obtained in terms of backward differences, 
for the purpose of integrating a function over r intervals terminating at the end 
of a tabulation. It may be seen that the formula for integrating from xy — rh 
to xy can be obtained from (5.4.8) by replacing A by --Ἦᾷῷ and po by py. Thus, 
for example, one has 


| p(x) dx 
xXxn—h 
= h(i — ν — - τὰν — AV? — ν᾿ -- γὲρν" — + °*)py (6.4.12) 


and 


[ p(x) dx 


XN 


= h(l + 4V + FeV? + BV? + FOV" + FeV? Ὁ +)py (5.4.13) 


the last formula being useful for integration over an interval beyond the range of 
tabulation and playing an important role in the numerical solution of differential 
equations. 

In each case, the error term to be introduced when p(x) 15 replaced by 
F(x) and the series is truncated with the nth difference can be expressed in the 
form given by (3.3.5). Thus, in the case of (5.4.8), there follows 


n= ET " (x -- Xo)(x -- χ) Ὁ -- x FOF) dx (56.4.14) 
(n+ 1)! J, 

where € depends upon x and lies between the smaller of x) and x, and the 

larger of x, and x,, and an analogous term applies to the formula with backward 

differences. 

When r = 1, or when r is a negative integer, the coefficient of f“* in 
(5.4.14), which has been denoted by z(x), does not change sign in the interval of 
integration. Thus the second law of the mean can then be applied to give the 
more useful form | 


(n+ 1) Xr 
E, = pe) mx) dx (r< 1) (5.4.15) 

(n oe 1)! ΧΟ 
where now ἔ is a constant, such that min (Χο; x,) « ἔ < max (χ,, x,). A 
reference to the form of Newton’s interpolation formula, from which the pre- 
ceding formulas may be obtained by integration, shows that, when (5.4.15) 


applies, the error term is obtained by replacing A*py or V‘py by h*f™(€) in the 


OPERATIONS WITH FINITE DIFFERENCES 189 


first nonvanishing term omitted. Thus, for example, we may deduce from (5.4. 12) 
that 


XN 4 
| f(s) dx = Wd — av — AVF, -- © pre@ 
xn—h 


where xy — 2h < ἔ < xy. 

In those cases when n = r, so that the number of differences retained is 
equal to the number of ἢ intervals in the range of integration, the formulas 
reduce to Newton-Cotes formulas when expressed in terms of the ordinates, 
and the error terms can be supplied by reference to the results of Sec. 3.5. Thus, 
for example, we may deduce from (5.4.10) that 


| or f(x) dx = INL + A + 4A2)f, — Lr) 


Xo 


where χρ < € < X9 + 2h, and the formula is equivalent to Simpson’s rule. 
If the terms involving A and A? in (5.4.10) are expressed explicitly in 
terms of the ordinates po, p;, and p,, the result takes the form 


Xo+ 2h h 
| p(x) dx = (Po + ἦρι + Pa) 


Xo 


- = (A' - AS + AS —---)p, (5.4.16) 


which may be considered as Simpson’s rule with “correction terms” expressed 
in terms of forward differences, for use at the beginning of a tabulation. The 
corresponding formula with backward differences is 


XN h 
| p(x) dx = © (py + 4Py-1 + Pa) 


xn—2h 


= = (v4 + V5 + 3Vo4-+-)py (5.4.17) 


5.5 Newtonian Formulas for Repeated Integration 
It frequently happens that the second derivative of a function F(x) is known 
F(x) = f(x) (5.5.1) 


and that F(x), and perhaps also F’(x), are required at a set of equally spaced 
points Xo, X1,..., Xy, With the values F(x.) = Fo and F'"(x9) = F, prescribed 
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in advance. In order to treat this problem operationally, without being con- 
cerned with remainder terms, we again imagine that F and f are replaced by 
polynomials and denote this fact by writing P and p for F and f, respectively. 

If P’(x) is tabulated at the points xo,..., Xy, it is seen that use may be 
made, say, of (5.4.9), written in the form 


Pha. = Pi - Βα + 4A — τῷ Δ2 + pA? — 7ξ5 Δ᾽ + oA? -- }Ρῷ (6.5.2) 


where / is the spacing, to obtain a corresponding tabulation of P’(x), after 
which the same formula may be used again in the form 


Py, =P, thd + fA -- τςΔ2 + «Δ᾽ — λ΄ + τὲσδ᾽ — +:-)P, (5.5.3) 


to determine the desired tabulation of P(x). Clearly, the formula (5.4.12) 
could be used instead and would be needed near the end of the tabulation if 
the value of δι, were not available for k > N. 

This procedure involves the formation of difference arrays relative to 
both P’(x) and P”(x). In order to avoid the necessity of two such arrays, we may 
transform (5.5.1) into the form 


Pray = Py t+ AP, + { ᾿ { p(t) dt dx (5.5.4) 
and seek an operator θ such that 
[ = [ p(t) 4ι ἀχ = Op, (5.5.5) 


Thus @ must be such that 
(E — 1 <r hD)P, = Op, = 6D7P, 


and hence 
θ = E-1—AD_ ,Ε -Ι logE (5.5.6) 
D? (log E)? 

In terms of the operator A, there then follows 

9 - prAclosG@ tA) _ 2 A — log (1 + A) A 7 
[log (1 + A)]’ A? log (1 + A) 
= μᾳ — 4A + $A? — +++) + $A -- eA? +--+? 
= μὮ + 4A — AyA? + 2ε:Δ᾽ -- geoA* + rhos0A” + 7) (5.5.7) 


so that (5.5.4) takes the form 


Poa, = Py + AP, + h(E + 4A — Bg? + χ)εΔ5 
— geoA* + rodd0A” + °°) PE (5.5.8) 
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Thus, if (5.5.2) and (5.5.8) are used for the calculation of values of P’(x) and 
P(x), only the differences of P’(x) are needed. In a similar way, the formula 
Posy = Py t+ hPias — "ΓΕ — ἐν -- AV’ -- ae 
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— χίον᾽ — Toes Pray (5.5.9) 


can be derived for use near the end of a tabulation of P”(x), in conjunction with 
(5.4.12), written in the form 


Prat = P, + h(i = 4V = τον" = τὰν" = ον" 
- τέσ" -- --ὍὁΡῷᾳ, (64.123 


In those cases when values of P’ are not required, we may derive a more 
useful formula by noticing that 


A?D-2P"(x) = A?P(x) (5.5.10) 


where the factor A? is inserted to annihilate the arbitrary linear function of x 
which would correspond to the (improper) inverse operator D~?. Hence there 
follows also 


2 
A?P, = h? ewer Py 
log (1 + A) 
= h*(1 + A + τςλξ + OA? — χίσδέ + 515Δ5 -- }Ρ} (5.5.11) 


Thus, since A?P, = Py. — 2Py4, + P,, this formula permits the deter- 
mination of P,,, from two preceding values of P. 

A corresponding expansion involving backward differences is obtained 
by replacing A by —V in the form 


ΝΡ, = = h(1 — V + ¥ τς ΓΕ ον = λον" = τιον — “ oe) Pe (5.5.12) 


This formula determines P, from P,_, and P,_» and makes use of δ΄, Another 
formula, in which only preceding values of P” are needed, is obtained by operat- 
ing on both sides of (5.5.12) by E, and replacing E by (1 — or in the right- 
hand member, to give 


VP = WL tEVt V2 4-5) — V+ eV? 4° 
= WW + OV + ὠν + GV? + ἀροῦν + AVI + .}} (5.5.13) 


In fact, a whole series of formulas of either type can be obtained, for 
example, by operating on both members of (5.5.12) or (5.5.13) by ag + a,V + 
a,V” +--+, where the a’sare arbitrary constants. Such formulas are particularly 
useful in the numerical solution of differential equations (see Sec. 6.10), which 
include (5.5.1) as a very special case. 

In order to illustrate the use of these formulas in connection with (5.5.1), 
we consider a simple example. It is supposed that the values of F” listed in the 
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following table are known and that the values F(i) = 0 and F’(1) = 1 are 
prescribed : 


x F F’ Ε΄ AF” ΔΖΕ" A3F” 
1.0 0 1.000 1.000 

331 
1.1 0.1055 1.1160 1.331 66 

397 6 
1.2 0.2244 1.728 72 

469 6 
1.3 0.3606 2.197 78 

547 6 
1.4 : 2.744 84 

: 631 

1.5 : 3.375 


In order to determine F, = F(1.1) and F, = F’(1.1), we use the approximate 
relations resulting from replacing P by F in (5.5.8) and (5.5.2): 


F, © 0 + (0.1)(1) + 0.01[4(1.000) + 4(0.331) 
— 34,(0.066) + 2;(0.006)] = 0.1055 
F! = 1 + 0.1[1.000 + 4(0.331) — (0.066) + 74(0.006)] = 1.1160 


Formula (5.5.8) is then used again to determine F,. For the evaluation of F3, 
sufficiently many backward differences are available for the use of (5.5.9) or 
(5.5.12). Hence, unless values of Ε΄ are required, F, need not be calculated, 
and F, may be determined by (5.5.12): 


F, ~ 2F, — F, + W(F% — VFS + ὑςν τ Ὲ: 
0.3433 + 0.01[2.197 — 0.469 + 75(0.072)] = 0.3606 


From this stage onward, use may be made exclusively of (5.5.12). 

In this example, the given data are exact values of F” corresponding to 
F"(x) = x3, from which there follows F(x) = 0.05x° + 0.75x — 0.8, and 
the results are correct to the places given. Since here the third difference of 
F"(x) is constant, exact values would have been obtained if no intermediate 
roundoffs had been effected. A check on the calculation, which would be useful 
if the last difference retained were not constant, would be afforded by the use of 
(5.5.13). 


5.6 Central-Difference Integration Formulas 


The most useful integration formulas involving central differences are those in 
which the differences are evaluated at the center of the range of integration, and 
the integral is expressed in the form 
xo+mh 
[ F(x) dx 


Xo—mh 
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In terms of the operator J defined in (5.2.5), this integral can be expressed in the 
symbolic form 


(ΕΠ + EU™** +--+ 4 E™ J py = ————__ Jp, 


when f(x) is a polynomial p(x), and hence we may write 


Xo+mh : 
| p(x) dx = 2 wee Po (5.6.1) 


Xo-—mh 


In order to obtain an expansion in central differences, we may first obtain 
the expansion 


: 2 2 4 4 6 6 
) sinh mhD ~ mh|1 + ™ (hD) m*(hD) git (hD) is 
D 6 120 5040 


and then replace AD by its expansion given in (5.3.12), to give 


᾿ 252 
2m ama +7 ~ 346? + ghpd4 + -..}} 


4°54 6 56 
ὡς τ 6. ἘΠῚ tine lee ee 
120 5040 


if, say, only coefficients of differences of order less than eight are desired. Hence 
we may obtain the formula 


xo +mh m2 m?(3m? = 5) 
x) dx = 2mh| 1 + — 6* + —“__— §* 
| AX) 6 360 
2024 2 
m*(3m 21m~ + 28) 59 4 
15120 


χο-- πιῇ 


+ ὃν | Po (5.6.2) 


In the special cases m = 1 and m = 2, the relevant formulas are of the 
forms 


xoth 
[ ‘ p(x) dx = 2h(1 + $67 a TH90" "ἢ τετσδ᾽ = SIeS00O" —"**)Do 


and 


ΧΟ 2h 
! 2h P(x) dx = 4h(1 + 36° + 956* — οδτδϑ + 387000" — ***)Do (5.6.4) 
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Formula (5.6.3) can also be expressed in the form 


xoth h h 
| ptx) ἀκ = Ἐκ + 40, Ὁ70 τ Ot -- ahd® + Hod — "τ ρο 
xo—h 
(5.6.5) 


and so considered as Simpson’s rule with “correction terms” expressed in terms 
of central differences. 

It is known (see Steffensen [1950]) that, if p(x) is replaced by f(x) in 
(5.6.2) and the formula is truncated with the difference of order 2k, then the 
error term to be introduced can be expressed in the convenient form 


E = joer to. (6) ὼ 57 (2 = 17) ate (s? 6 k?) ds (5 6 6) 
es (2k + 2)! J mm 
where x» — mh < € <x) + mh if k Ξ mand χρ —kh<€ <x) + khit 
k =m. Reference to the Stirling interpolation formula, from which the pre- 
ceding formulas may be obtained by integration, shows that this error term 
is obtained by replacing 5?**2p,. by h?**?fC**2(E) in the first nonvanishing 
term omitted. 
An important formula, relating to repeated integration, is obtained by 
noticing that since 


6°D~2P"(x) = δ P(x) 
and since we have the expansion 
δ: 
D? 


h2(1 — py? + dyd* — τἰοδό Ὁ ...0 ἢ 


h?[1 + (gy? — o'5d* + ἘΣοδ Ὁ τ) 
+ (qd? -- δ +--+)? + (7567 + °°: τ .0]} 
from (5.3.12) and (5.3.13), there follows 
δ2Ρ, = h?(1 + 6? — 2400" + soasoO — °°) PE (5.6.7) 


Because of the fact that only differences of even order are involved, this 
formula is usually preferable to (5.5.12) for advancing a step-by-step double 
integration of a given tabulated function, over the portion of the range in which 
the requisite central differences are available. The formula (5.6.7) also will be 
used in the numerical solution of boundary-value problems governed by certain 
second-order differential equations (Sec. 6.15), whereas the analogous formulas 
(5.5.12) and (5.5.13) are to be used for corresponding initial-value problems 
(Sec. 6.10). 
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5.7 Subtabulation 


In some situations it is desirable to determine, from a given difference table 
based on the spacing ἡ, a new set of differences based on a new spacing ph. 
This problem would occur, for example, if a function were initially tabulated 
for increments of 0.1 in x and it were required to subtabulate the function for 
increments of 0.01, in which case p = 0.1. Whereas this subtabulation clearly 
could be effected by the use of an appropriate interpolation formula, it is often 
more convenient to form certain new differences, based on the new spacing, and 
to build up the table by addition, as will be illustrated at the end of this section. 
The problem also arises when a finite-difference method is used in a step-by- 
step numerical solution of a differential equation, in which case a halving of the 
interval is desirable when a derivative of the solution being determined begins to 
change too rapidly. 

In order to obtain formulas for such purposes, we designate the shifting 
operator relative to the new spacing ph by E,, and notice that since E, effects 
an ἢ shift p times, there follows 


E, = E? 


If we designate the forward-difference operator relative to ph by A,, there then 
follows 


1 + A, = (1 + Δ) (5.7.1) 
and hence we obtain the desired transformation in the symbolic form 


AT = [( -- Ay -- 17’ 


= pd + PL =D a2 4 PO - DO- Aas, ...f (5.7.2) 
2! 3! 


The leading terms in this expansion can be obtained in the form 


A’ ΕΙΣ ρ' ἷα 4 r(p πα 1) Arti 
2 
+ ee [400 — 2) + 30 — 1)(p — 1)]A"*? 


+ TP [219 — 2χρ -- 3) + Mr — τ -- (0 -- 2) 


+(r — τζ — 2). — 1)21Δγ 3. a (5.7.3) 
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In particular, in the important case p = 4, where the spacing is halved, this 
formula becomes 


ἀξ Ὁ he Oe dees 
ne ar Ta ey 21 - Al 


=F Ar = Parti ΞΕ r(r + 3) Art? 
4 32 


rr Ὁ 4) Ἐ 5) 


AO ΤΣ 5.7.4 
τὰ | 6 


whereas the formula reduces in the case p = +5 to 


Δ’ -- 10. Ὁ Ar — OF arts a 3r(27r τ 49) Art 2 
20 800 
2 
_ 3r(8ir? + 441r + 580) jre3 4 00. (5.7.5) 
τ 16000 


In order to illustrate an appropriate technique, we again consider the data 
tabulated in Sec. 4.8, where a difference table is constructed with spacing 
h = 0.1, and suppose that the data are to be subtabulated by tenths, that is, 
with a new spacing 0.01. Here, with p = 0.1, Eq. (5.7.5) gives the formulas 


A, = 0.1A — 0.045A? + 0,0285A? — 0.0206625A* + --- 
A2 = 0.01A? — 0.009A? + 0.007725A* — --- 

A? = 0.001A? — 0.00135A* + °°: 

A+ = 0.0001A* + “- 


(5.7.6) 


through fourth differences, where the coefficients have been expressed exactly, 
for convenient reference. In units of the fifth place, the new forward differences 
relative to x = 1.0 and x = 1.1 are found as follows: 


x=10: A,f= 5362 A?f= --8.5 

δὴ εξ; —0.05  Atf = 0.0008 ~ 0 
11: A, f= 449.1 A? f = --8.9 

435, -- —0.05 Atf = 0.001 “0 


of 
I 


Thus, we may suppose that the third differences are constant (within the ac- 
curacy indicated) over the first range, and we may set up the underlined entries 
in the following table: 
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x f Af A2f A3y 
1.00 0.84147 

536.2 
1.01 0.846832 —8.5 

527.7 — 0.05 
1.02 0.852109 — 8.6 

519.1 — 0.05 
1.03 0.857300 —8.6 

510.5 — 90.05 
1.04 0.862405 —8.6 

501.9 — 0.05 
1.05 0.867424 -- 8.7 

493.2 — 0.05 


The remaining entries are then filled in by addition, proceeding from right to 
left, and the results rounded, correctly to five places, to known rounded values 
of f(x) = sin x. 

The decimal parts of units in the fifth place are retained in order to reduce 
the danger of propagated effects of roundoff errors. Since here the errors are 
propagated to the left, and since (see Sec. 4.9) then errors of magnitude e in 
the rth difference could lead to errors of magnitude 2"e in the calculated values 
of f, it follows that if no errors of one-half unit are to be so introduced, the 
roundoff errors in the rth differences should be smaller than 2. "2 units in 
that place. Hence, for this reason alone, at least one extra place should be 
retained in the intermediate subtabulation of f and in the first two differences, 
two extra places in each of the next three differences, and so forth. 

If backward differences are used, we see that (5.7.1) must be replaced by 


1~V, =(1-—Vvy 


and hence all the formulas of this section are transformed to corresponding formulas 
for backward differences by replacing A, by —V, and A by —V. Formulas 
using central differences may be derived similarly (see Prob. 16). 


5.8 Summation and Integration. The Euler-Maclaurin Sum Formula 


The problem of evaluating a sum 


> (GS oi ee (k > m) 


where f, = f(Xo + vh), is closely related to the problem of determining a 
function F(x) such that 


AF(x) = f(x) (5.8.1) 
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since, if any such function F(x) is known, there follows immediately 


Σ f, = (Fai — Fm) + ΡΞ ρα) + 5: 


+ (Fy-1 — Free) + (ἢ Ξ Fee) 
= F, — F, = {FJ (5.8.2) 


It should be noticed that the upper limit in the last term exceeds by unity the 
upper limit in the original sum. 

If we invert (5.8.1) in the symbolic form F, = Δ΄ 1f , it follows that we 
may write 


κ-- 1 κ--ἴ 
A'f,=C+ > f, and > f=(4- 71: (5.8.3) 
v=M v=m 


where C is an arbitrary constant and M is an arbitrarily fixed integer such that 
M <™m 3 k, with the usual convention that 


μι--ἴ 
hao 


Thus we may refer to A~ +f, as an indefinite sum of f, and may correspondingly 
consider indefinite summation to be the inverse of the process of differencing, 
just as indefinite integration is the inverse of differentiation. As was noted 
previously, to any one inverse Δ΄ 1f(x) we may add any function @,(x) which 
is of period h since any such function is annihilated by A. However, if only 
values of x which differ from some fixed value xp by integral multiples of / are 
involved, then, for that set of values of x, the additive function w,(x) reduces to 
the constant C, which itself disappears in definite summation between limits. 

There exist extensive tables of sum functions (analogous to tables of 
integrals) which facilitate the evaluation of many special sums by use of (5.8.3). 
We consider next some other techniques for evaluating or approximating the 
value of a sum. A simple formula for summing any polynomial p(x) is obtained 
by writing 


γ-1 re 
S pet ESE + 1 Ep = ἱ προ 
k=0 Ε -- 1 
_ (+A) —1 
A 


[ + aD ay 4 OH OED ge 5 1 ρ (584) 


Thus, for example, in order to sum the series 17 + 27 4+---+ 7r7, we may 
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take p(x) = (x + 1)’, xo = 0, and = 1. With py = 1, Apo = 3, A?p, = 2, 
A*po = ++: = 0, Eq. (5.8.4) gives 


γα - 1), τῷ -- Nr - 2), 
2! 3! 


= 4r(r + γῶν + 1) 


1. 22. τυ δε γι +  “ 


The formula (5.8.4) is principally useful for the finite summation of a 
polynomial of degree n small relative to the number of terms r, so that the 
number of terms in the transformed series is small relative to the original 
number. In order to obtain a formula which is of more general usefulness in 
finite or infinite summation, as well as in numerical integration, we again first 
restrict attention to a polynomial p(x). From the operational identity 


DJA“1=1 (5.8.5) 


we may deduce that 


ee ( — ἢ p(x) (5.8.6) 


The coefficients B, in the expansion 


: - 2p (5.8.7) 


for small |¢|, are the so-called Bernoulli numbers, which occur in many fields 
of mathematics.f It is found that B, = B, = B, =--- = 0,and the following 
additional values may be listed: 


— — 1 -- 1 ΡΈΕΙ 1 

Bo = 1 By = -4 B,=% By = —3%5 
ee 1 gat: Jo ee) — —_691 

Bg ~ 42 Bs ei. 30 Bio ~~ 66 By» = 2730 (5.8.8) 
any -- --3611Τ — 43867 — —174611 

By - 6 Bi = 510 By “™ 798 Bro _ 330 


Hence, with this notation, (5.8.6) can be expressed in the form 


hp(x) = Pa — μ᾽ p(x) 
or 


xth Ὁ 
hp(x) = { p(t) dt + > = h*D’J p(x) (5.8.9) 
* v=1 ° 


} The notation B, is sometimes used for the present B2,. | 
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By using (5.8.5) to replace D’J p(x) by D”™ 1f p(x + h) — p(x)], we may express 
this result in the more explicit form 


1 xt+h 00 BY 
p(x) = | p(t) at + SE aE pe PGE +h) = POPC] (6810) 
h J. at νἱ 


If we write (5.8.10) for 
x = Xo; X, (= χρ + A), eS X,-1 L= Χο +  — DAI 


and sum the results, noticing the “telescoping” of the resultant terms in brackets, 
we deduce the identity 


r—1 Xr 2 
Σ νι τὶ ae er ee ϑτη] σθαι 
k=0 h J xo “5: v! 
where p, = p(x,) and p?~ = p~(x,). This result is usually known as the 
Euler-Maclaurin sum formula for a polynomial, although that name is also 
sometimes applied instead to (5.8.10), which leads to (5.8.11), or to still another 
formula, which generalizes (5.8.11). 

It can be written in a somewhat more convenient form by making use 
of the fact that all Bernoulli numbers with odd subscripts greater than unity 
are zero. Thus, if we extract the term corresponding to v = 1, and afterward 
replace v by 2i, we obtain the form 


r 1 [χε 
DP = Z| κ ἀκ + Hoo + pd 


a Bay Πρ ὺυ ᾿ς py] (5.8.12) 
5: (2i)! 
If the degree of the polynomial p(x) is 2m or 2m + 1, the series on the right 
terminates when i = m. 
When f(x) is not a polynomial, the result of replacing p(x) by f(x) in 
the series (5.8.12) must be terminated, say, with i = m, and an appropriate 
error term must be introduced, so that we write | 


Xr 


Sha Z| feet Wot 


ΧΟ 


| ς Bo; 2i-17 ¢(2i-1) (2ὲ-- 1) | 


It is known (see Prob. 28) that this error term is expressible in the form 


2m+2 
r Bom+2h 


es 
(2m + 2)! 


FEM AED — (5.8.14) 


OPERATIONS WITH FINITE DIFFERENCES 201 


where x9 < € < x,, when r is finite. When r > οὐ and also x, > οὐ, this 
form becomes indeterminate and must be replaced by a somewhat more elaborate 
one. 

Frequently it is possible to avoid the use of the error formula (5.8.14) 
or its substitute by making use of the fact (see Steffensen [1950]) that if 
SOx) and f?"*%(x) do not change sign for xy < x < x,, then E,, is 
numerically smaller than the first neglected term and is of the same sign. More 
generally, if it is known only that £°°"*?(x) does not change sign, then E,, 
is numerically smaller than twice the first neglected term and of the same sign 
(see Prob. 28 and Steffensen [1950]). 

The fact that rules of this type apply rather frequently to interpolation 
series and to allied series makes the procedure of using the first omitted term 
as a basis for estimating the order of magnitude of the error somewhat less 
hazardous in connection with such series than with convergent series more often 
encountered in other fields. 

A formula similar to (5.8.13), but summing instead the ordinates midway 
between the successive ordinates involved in (5.8.13), is sometimes called 
the second Euler-Maclaurin formula (see Steffensen [1950]) and is of the form 


r—1 1 Xr 
> Set 1/2) = Al I(x) dx 
k=0 h Jxo 


Ν (1 -- 2! “)Baih”! : [fi — 2D] 4 BE (5.8.15) 
“: (2i)! 

where 
Poe ἢ ( -- 2) Bog oho ῦ2 f2m*2" 2) (5.8.16) 

(2m + 2)! 
Here, again, if f°"*”(x) and f?"*”(x) are of constant sign in (Xo, x,), the 
error is numerically smaller than the first neglected term and is of the same sign. 
If only f°"*”(x) is known to be of constant sign, then E,, can be shown to be 
numerically smaller than three times the first neglected term and of the same sign. 
It may be seen that the correction terms in both (5.8.13) and (5.8.15) 
all vanish if, say, f(x) is periodic, of period x, — Xo, although the error term 
naturally remains (see, for example, Prob. 36 of Chap. 3).+ Here it is of interest 
to notice that m can be assigned any positive integral value in E,,, assuming 
only that f°"* (x) is continuous on [xo, x,]. Generally there is one such value, 
in correspondence with a specified f(x), for which the corresponding error bound 
is minimized. 
} For significant applications of these formulas in such cases, see Fettis [1955, 1958] 
and Luke [1956]. 
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We see that these formulas each relate a given sum and an integral in 
terms of an associated sum of m terms and a corresponding error term, where m 
can be chosen at pleasure. While the first formula is useful both for numerical 
integration and for numerical summation of series, the second is used chiefly 
for integration. 

Before considering the use of these formulas for approximate summation, 
we note that the Euler-Maclaurin formula (5.8.12) can be written in the form 


[sone =H th tht thar ti - SG - 


μ΄ h® 
+. cee "ῷ ὍΝ vt ea ee ἂν wae Vv ἝἜ ees 
f, 0) 30240 σ So) 


mes Bom ype — (ἤπ 0] ΝΕ hE,, . (5.8.17) 
(2m)! 
where E,, is defined by (5.8.14), and hence can be considered as the trapezoidal 
rule with correction terms expressed in terms of derivatives. Similarly, the 
- formula (5.8.15) can be written in the form 


Νὰ h? t t 
| f(x) dx = π(,2 + fj2 + << + fr-caj2y) + 5 σ᾽ — fo) 
ὼ Th* 31h | 
= a _ fl ΞΕ ἣν — ΧΑ. τὸς 
5760 (. 0) + 967680 (ν᾽ — Jo) 


+ ὁ τὰ 2B an an) Bam h2mp fem-Y μἀἄἔπ 1 — hE,, (5.8.18) 
(2m)! 
where E,, here is defined by (5.8.16), and hence can be considered as the repeated 
midpoint rule with derivative correction terms. A comparison of (5.8.14) and 
(5.8.16) shows that the second formula tends to be slightly more accurate than 
the first, on the average, when truncated with the same number of correction 
terms. 
A useful modification of (5.8.17) is obtained when the derivatives at Xo 
are expressed in terms of forward differences, by using (5.3.5), and those at 
x, are expressed in terms of backward differences, by using (5.3.9): 


hf, = Mfy — 4A7fy + 4A2fo — 4A*fo + SAO -- τ" 
hf’ = Vf, + 4V2f, + 4V°F, + τυ, + ΤΥ τ τ 
350" = A3f, — δῖ + FAS -- """ 
hf” = Vf, + νὴ + IV + τ 
hefy = Δ -- """ 
Wf = Vf τ τ" 
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The result of this substitution is of the form 


[τ dx τ Wfot hi tht t+ fs + th) 


_ A = _ Ah ww 2γ. 19} ose _ 43 
1g We — ΔΙ) -- VE + A’fo) 750 WV ft — So) 
EG eR a a ey aon! 
160 Yt + AS) — Bey We — AS) (5.8.19) 


and is known as Gregory’s formula. If no differences beyond the rth are retained, 
only values of the integrand in the interval of integration are involved. It 
appears that no tractable expression is known for the implied error term to be 
inserted after an appropriate truncation of the series. This formula can also be 
derived directly by operational methods (see Prob. 29). A similar modification 
of (5.8.18) is clearly possible. 

In addition, instead the derivatives at the end points can be replaced by 
mean central differences of odd order. The formula so obtained from (5.8.17) 
is associated with Gauss and is derived in Prob. 18. Although the leading 
coefficients of the correction terms decrease more rapidly in magnitude than 
those in Gregory’s formula, Gauss’ formula has the disadvantage that it always 
involves ordinates which lie outside the interval of integration. 


5.9 Approximate Summation 


The formulas (5.8.17) and (5.8.19) are expressed in a form suitable for approx- 
imate evaluation of the relevant integral. When the formulas are to be used 
instead, say, for approximate summation of an infinite series, under the assump- 
tion that the integral can be evaluated (or suitably approximated) otherwise, 
they may be expressed in the form 


oo {9 ἼΣΑ ἣν — ξ"Π[Γ0 + stoh?f’ -- -..) (5.9.1) 
| Hfo — ἐδ + TeA’fo -- 0) (5.9.2) 


where f, = Κ(χο + kh), when applied formally to a function f(x), under the 
assumptions that f(x) and its derivatives vanish as x > οὐ, and that the series 
and integral are convergent. 

If the terms f, are of constant sign and decrease slowly in magnitude, so 
that the given series converges slowly, the successive terms in the transformed 
series generally decrease rapidly in magnitude, at least up to a certain stage. 
Thus these series, while generally asymptotic, are often useful for calculation in 
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such cases. In illustration, for f(x) = 1/x? and ἢ = 1, the relation (5.9.1) 
becomes 


a (a + 1) (a + 2) 
1 41 1 1 Β, 
ας Ee ie a τα ewe pe. BIOS 
a 2a 6a? 30a° qt O29 
since here f?°'"~ 1) = —(2m)!/a?™*1. Whereas the series on the left converges 


rather slowly, the terms on the right decrease rapidly when a is fairly large. Thus, 
if we take a = 100, there follows 


a ΒΟ Σ ΜΟῚ 
100? 1012 1022 
= 1072 - 1 χ 107 -- ξ χ 10°°- x 1019 + EB, 
and the retention of only the first three terms on the right gives 
1 ΕἼ 1 i 1 
1002Ζὥ 1012 1022 
correctly to 10 places. Nearly 2 x 10 terms of the original series would be 
needed to supply this accuracy! 
It is of interest to notice that B,, was shown by Euler to be expressible 
in the form 
2(—1)'~ 1(2i)! 1 1 : 
B,, = ————_ (1 +a+at i21l 5.9.4 
2 (2n)*! 921 37! ( ) ( ) 
Thus, since (2i)! ultimately grows more rapidly than a” for any fixed a, it 
follows that, whereas B,, at first decreases with i, ultimately B,, increases more 
rapidly than a”' for any fixed a, as i increases without limit. Hence it is evident 
that the result of omitting £,, in the right-hand member of (5.9.3) will not 
converge asm —> oo. The expression (5.8.14) is of no use whenr — oo. However, 
the test described following that equation shows indeed that here E,, is smaller 
in magnitude than the first neglected term, and that it decreases in magnitude 
until m is approximately equal to za, after which it begins to increase un- 
boundedly in magnitude and to oscillate in sign. In the case a = 100, this 
would mean that the retention of additional terms would continue to improve 
the approximation until more than 300 terms were taken. However, in the 
case a = 1, fer which the left-hand member of (5.9.3) has the known value 
n7/6 =: 1.64493, in accordance with (5.9.4), the right-hand member becomes 


+ 11: = 0,0100501667 


Lt+h+d-aeth— dot ἐς - Het bt 
Here the error E associated with the truncation of this series after 7 terms varies 
with 7 as is indicated in the following table. 
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Fo} 


= 


0.645 
0.145 
— 0.022 
0.012 
— 0.012 
0.021 
— 0.055 
0.198 
— 0.968 
6.124 


OOO WNAMARWN = 


μα 


This type of phenomenon, in which the successive members of a sequence 
of approximations first approach nearer and nearer to the desired result, and 
then begin to oscillate about it with ever-increasing amplitude, arises very 
frequently in numerical analysis. Whereas such a situation can often be brought 
about by prolonged propagation of roundoff errors (and is too often attributed 
to this cause by computers!), we have seen here, and in Secs. 3.9 and 4.11, 
that it can also result from successively progressing to procedures of “‘higher- 
order accuracy,” when this progress leads eventually to using too many terms 
of a divergent (but asymptotic) series, even though it be assumed that no 
roundoff errors are introduced. (Additional situations of this type will be 
encountered in other chapters.) 

The preceding transformations usually are not useful when the terms in the 
given series fluctuate in sign. However, in those situations when the signs of 
successive terms steadily alternate, there exist more appropriate transformations, 
of similar type, which possess the additional advantage that their use does not 
involve the evaluation of an integral. Their formal derivation is simply effected 
by noticing that the operational relation 


1 
Ρο -- Pi + P2 — Pp ὉΠ Ξ (1 —E - E* —:::)py = Po (5.9.5) 
I1+E 
is valid for any polynomial p(x), and that we have also 
a(t =the. 556) 
1+E 2 2 2+A 


Hence, by formally replacing p by f in (5.9.5) and expanding the operator 
1/(1 + E) in accordance with (5.9.6), we obtain the relations 
2( σὺ — 3hfo + deh*fo" — xanh* fo’ 
+ gosz0h 70. — ++) (5.9.7) 
(fo — δύο + ἐδ — --- 
+ (-- 172 "Δ + ---) (5.9.8) 


S (-D% = 
k=0 
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The second relation (5.9.8), expressed in terms of forward differences, is 
often known as Euler’s transformation.t It is known (see Hardy [1949]) 
that the transformed series in (5.9.8) will converge whenever the given series 
does so, and to the same sum. (A transformation having this property some- 
times is said to be regular.) Indeed, the transformed series may converge when 
the parent series does not, in which case the sum of the transformed series is 
often called the Euler sum of the parent series. The other transformed series 
(5.9.7) generally is asymptotic, but the rate of effective convergence of the 
leading terms often is more rapid. 

In illustration, if only the first four terms of the series 


4--1,- 1.ι.1.-.-Ἡχ 4 Caypttl ee ( ον 2) (5.9.9) 
2 3 4 n 


are summed initially, to give 


the use of (5.9.7) and (5.9.8), with f(x) = 1/x, x» = 5, and ἡ = 1, is found to 
yield the transformed series 


5 = a 4 [28 + τὸν — sooo + exten ~ τσίδσοσ + ° (5.9.10) 

fo + xiv + sto + τοῖο + τοῖεσ ὁ Τ᾿ (5.9.11) 

after an appropriate tabulation and differencing of the ordinates f, in the second 
case. 

Retention of five terms of the transformed series in (5.9.10) yields an 
approximation to log 2 = 0.693147183 with an error smaller than 6 x 10:7, 
whereas the same truncation of the Euler series (5.9.11) is in error by about 
2 x 1075. If additional terms were retained in these two series, the second 
would continue to converge indefinitely, whereas the oscillation of the first 
series eventually (after about nine terms) would begin to increase unboundedly. 
More efficient transformations would have been effected by summing more than 
four terms of the given series in advance. 

A useful variant of the Euler transformation (5.9.8), which also yields a 
convergent series when the parent series converges, is expressible in the formt 


il 
Ms 


1 (-1) 4 (-"ti< ere 
(-1)f%, =- “Δ + —— > (-1)fA"*'f, (69.12) 


+ This transformation is closely related to that considered in Probs. 8 and 9 of Chap. 1. 
t This formula can be obtained by operational methods or, rigorously, by n iterations 
of the transformation considered in Prob. 8 of Chap. 1. 
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The right-hand member can be interpreted as the result of truncating the Euler 
formula with nth differences and expressing the error term as an infinite series 
of (n + 1)th differences. In particular, if A’**f, is of constant sign for k > 0 
and tends steadily to zero as k — oo, we may deduce that the truncation error 
in the Euler formula (5.9.8) is smaller in magnitude than twice the first omitted 
term and is of the same sign. This situation will exist if f“*t (x) is of constant 
sign when x = Xp, and if it tends steadily to zero as x > οὐ. 

The Euler transformation is most efficient when the alternating series 
fo — fi + fo, — +++ converges very slowly, so that f, tends to zero, say, like 
I/k as Καὶ - oo. When f, tends to zero, say, like r* (r < 1), so that the series 
simulates an alternating geometric series, a useful generalization results from 
writing 

he=r"g, (5.9.13) 
where r may be identified, for example, with a representative value of f;,,,/ hk 
or with its limit as k + oo. The formal symbolic relation 


fo-Ath—-fte=QG-E+ RB -- Εἰ τ τ). 
then becomes 
fo-fith-fpt+s:=( -- ΤῈ + τ — EP +---)go 
1 1 


ἜΤ δ ee ee 
and yields the formula 
— 1 r r \~ 
—1)'r*g, = ——— — —— Agy + | ——} A’g, -- “"" 
2! ΣΤ θι esi 1+r 60 (γ2:)} 50 | 
(5.9.14) 


which is equivalent to (5.9.8) when r is taken to be unity. For any fixed r > 0, 
the right-hand member will converge when the left-hand member does so. 

Other generalizations of a similar nature are readily devised. Thus, if we 
write f, = C,9,, We may derive the formal relation 


> (-lag. = ζῶ + a A+ a A? + “ἢ 9o (69.15) 


where @(¢) is the function possessing the Taylor expansion 
Pt) = > (—I)ke,t* (5.9.16) 
k=0 


under the assumption that the interval of convergence includes ¢ = 1. Here 
c, is to be determined so that g, tends to vary slowly with increasing k and, 
desirably, so that ¢(t) is identifiable in closed form. 
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A related class of transformations, which frequently accelerate the con- 
vergence of alternating series, deals directly with the sequence of partial sums 
S,, such that 


Se=fo -~fi tho —Js te + (-1f, (5.9.17) 


and replaces the sequence So, S;,..., S;,,..- by a new sequence ( Oa) bearer 
T,, ..-, where | 
WoS, + WiSpe-1 ἘΠ + WSo 


T, = 
Wo Wy ἘΠ᾽ + Wy 


(5.9.18) 
with a suitable definition of the weighting coefficients wo, Wy,..., γι» after 
which the transformation may be iterated. It is known (see Hardy [1949]) 
that the T sequence will converge to the same limit as does the S sequence if the 
conditions 


Wy 


Wo > O w,2O0Udsrsk) lim —————_*————- = 0__ (5.9.19) 
kc Wo + Wy ΤΠ + Wy 
are satisfied. 
The choices 
Wo =W, = = w= 1 (5.9.20) 
and 
Wo = W, = 1 We = Wz, πο πρὸ (5.9.21) 


are most often used, the latter often being particularly efficient (when the 75 
are positive), and they are associated with the names of Cesaro and Hutton, 
respectively. 

As in the case of the Euler transformation, it happens that the Cesaro 
sequence may converge to a limit C when the parent series > (—1)'f, is divergent, 
in which case C may be called the Cesdro sum of that series. A similar statement 
applies to the Hutton sequence. 


5.10 Error Terms in Integration Formulas 


This section presents methods of obtaining expressions for the error term to be 
inserted in a formula for numerical integration, obtained (by operational 
methods or otherwise) in such a way that it reduces to an identity when applied 
to a polynomial of sufficiently low degree, in those cases when the formula is 
applied to a function of more general type. The methods are readily modified 
to the consideration of formulas for interpolation or for numerical differentiation 
or other linear processes. For simplicity, we deal specifically with integral 
formulas involving only ordinates. 
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For present purposes, it is convenient to suppose that the formula at hand 
is expressed explicitly in terms of the ordinates involved rather than differences 
or divided differences. Also, in order to include formulas considered in Sec. 3.10 
and a class of formulas to be developed in Chap. 8, as well as all others considered 
so far, we suppose that the formula is of the rather general form 


Ϊ ΝΘ δ ae > Wef(x,) +R (5.10.1) 


where w(x) is a prescribed weighting function, which is unity in most of the for- 
mulas considered so far and which is nonnegative in [a, b] in most other applica- 
tions; where xo, X1,..., X, are + 1 abscissas, not necessarily equally spaced; 
and where Wo, W,,..., W,, are the corresponding so-called weighting coefficients. 

It is supposed that the required error R is zero when f(x) is any polynomial 
of degree N or less. If also R is not zero when f(x) is a polynomial of degree 
N + 1, then N is called the degree of precision of the integration formula. 
However, we suppose here only that the degree of precision is at least N, where 
N is a known positive integer. We also assume explicitly that w(x) = 0 in 
[a, δ]. 


We may transpose Eq. (5.10.1) into the form 


b n 
RLf()] = Ϊ wa)f(e) ἀκ -- Σ θοῦ, (6102) 


a 


where the notation R[_f(x)] is used to indicate that-the operation involved in 
the right-hand member has been effected on f(x). Our hypothesis, therefore, 
is that 


R[xv] =0 (r=0,1,2,...,N) (5.10.3) 


In order to treat situations in which some of the abscissas lie outside the in- 
tegration interval [a, b], we suppose that the abscissas are ordered in increasing 
algebraic order and denote the smaller of x, and a by A and the larger of x, 
and 5 by B, so that all relevant values of x lie in the interval J = [A, B]. 
Attention is restricted to those functions which possess N + 1 continuous 
derivatives on [4, B]. 

Then, for any values of x and X in [A, B], we can write 


fe) σῷ +2 ww - + ME H+. 
1 Ὁ (χ -- ΧῚΝ ΤΠ (9) (x Ξι Χ) Ὁ (5.10.4) 


where, for any fixed x in J, € depends upon x, but lies in (A, B). Since the first 
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N + 1 terms in the right-hand member comprise a polynomial of degree N, 
which is annihilated by the operator in (5.10.2), the error R[ /(x)] is the same as 
the error term corresponding to the remainder term 


ce FO i 
E(x) = (Va D! ΠῚ (x — xX) (5.10.5) 


and hence 


(N + 1) RESO] = { * (x(x — OTF OHNE dx 


a 


— D> Κὰρ — X)PUFETME) (610.6) 
Ὁ 


> 
lI 


where ἔ, 9, &,,..-, &, all lie in (A, B). 

This form of the error term is generally not a very useful one. However, 
if we denote the maximum value of | f*(x)| on [A, B] by M, and notice 
that |x — ΧΙ < (B — A)/2 in [A, B] when X = (A + B)/2, it permits the 
crude estimate 


R) < — ΜΕ wey ax + Sim] 6.10.7) 
Te ONTTON aed Ee kK=0 


where 
L=B—A_ and |f®tx)| Ξ Mon[A,B] (6.10.8) 
Since R = 0 in (5.10.1) when f(x) = 1, there follows 


Ϊ eet δ: Σ νὰ (6109) 


Hence, in those cases when all the weights W, are nonnegative, the error bound 
(5.10.7) can be expressed in the simpler form 


N+1 b 
IR| =< "ΕΒ | w(x) dx (5.10.10) 
2NN + 1)! 


a 


where L = ὁ — a when none of the abscissas lies outside [a, b]. 

This error bound, while of simple form, is often extremely conservative. 
In order to obtain a more useful form, we may replace the remainder (5.10.5) 
by the integral form 


E,(x) = τ | “(x -- ΝΜ 10) ds (5.10.11) 


which possesses the advantage that no unknown parameter, corresponding to 
the ἔ in (5.10.5), appears (see Sec. 1.9). If we identify x with A, the relation 
(5.10.6) is then replaced by the form 
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b x 
N!R[f(x)] = Ϊ wx) | (x — s\\f&*1Xs) ds dx 
a A 
—- > νὰ | “ (x, — s)\Nf%*(s) ds (5.10.12) 
k=0 A 
In order to express this result in more convenient form, it is useful to 
introduce the notation 


(x — s)/* when x 


cc k — 
(x — 5)" ᾿ a (5.10.13) 


IA V 


in accordance with which (5.10.12) can be written in the form 
b B 
NIRL AC] = [νοῦ {ὁ = 9% £070) ds ae 
a A 
n B 
- δ μι | (x, -- 5,5 f%*%s)ds (56.10.14) 
k=0 A 


Since the integration limits are now constant, the order of integration is readily 
reversed to give 


NERLACO] = fF) [{Ὸ ~ οὐ νοῦ ax — Σ Wom — 94] 
A a k=1 
or, equivalently, 


B 
R[ f(x)] = | G(s) f"*'(s) ds (5.10.15) 
A 
where G(s) is defined by the equation 


N! G(s) = [ (x -- s)¥ w(x) dx -- Σ Wx, -- sy (5.10.16) 
a k=0 


and may be called the influence function (or kernel function) for the integration 
formula (5.10.1) relevant to Ν. ἢ 

It is useful to notice that G(s) can be considered as the error in (5.10.1) 
when f(x) is identified with (x — s)/N!. The definition can also be expressed 
in the more explicit form 


[ (x — s)\w(x) dx  (s <a) 
PCO [ (x -- s)"w(x)dx (ἋαΞ 5.5 bj _ »:: χε -- 5)" 
0 (s = b) 


(5.10.17) 


+ This form appears to be due to Peano [1913, 1914]. 
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where the notation in the right-hand member indicates that the sum is to be 
taken over those values of k for which x, > 5. It is easily seen that G(s) vanishes 
for all values of s outside the interval [A, B] over which the integration is 
effected in (5.10.15). 

In illustration, we consider the simple integration formula 


[ f(x) dx = f(a) + Ζ73(--) + R (0 Ξα Ξ 1) (5.10.18) 


where « is a fixed constant. It is seen that Καὶ = Ofor f(x) = 1 and ἴογ f(x) = x, 
but that Καὶ # 0 for f(x) = x? unless a? = 4. Thus we have always N = 1, 
and also N > 1 whena? = 1. Here[A, B] = [a,b] = [—1, 1] and w) = 1. 
The use of (5.10.16) or (5.10.17) gives 


1a) = [ (x — s), dx — (—a — Ss), — (@ — 8) 


ἘΞ ὠς ταν 


[ (x -- 5) ἀχ -- > (% -- 5) (5.10.19) 


when [5] < 1, so that 


= 2 
G(s) = oa - Σ (x, -- 5) (5.10.20) 
where x) = —a and x, = +a. Hence there follows 


CD axes το -ἰοὴ 


7 
-G+y (-1<s< --αὶ 
G(s) = 
Cs —(«—s)= “τῷ - 39 (--α 3 5 3 α) (5.10.21) 
Gas (a<s<1) 


and, with G(s) so defined, the error R in (5.10.18) can be expressed in the form 
1 
R= | G(s) f"(s) ds (5.10.22) 
-: 


We may notice that this function G(s) is made up of the arcs of three para- 
bolas which join continuously at the transition points, coinciding with the 
abscissas employed in (5.10.18). However, the slope G’(s) decreases abruptly 
by unity as each such point is crossed in the positive direction. Also, in each 
subinterval we have G"(s) = 1. 
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If « Ξ 4, G(s) vanishes only at the ends s = +1, and at s = 0 in the 
special case « = 4, and is otherwise positive in [—1, 1]. Hence, in this case, the 
second law of the mean may be invoked to permit (5.10.22) to be written in the 
form 


R=£O| GG) ds == pre (0<a<4) (6.10.23) 


where [ξ] < 1. The formula (5.10.18) reduces to the midpoint rule over [—1, 1] 
when a = 0. When « = 1, and the formula becomes the trapezoidal rule over 
[—1, 1], there follows merely G(s) = (s? — 1)/2 for —1 < s < 1. In this case 
G(s) is negative throughout the interior of the interval, so that the law of the 
mean again can be applied, and (5.10.23) also holds in this case: 


arom | 


sds = -ἀἄῷ (ἀ-ὴ (5.10.24) 


R=s@ | 


If 4 < a < 1, G(s) changes sign at s = τ να — 1, and (5.10.22) cannot be 
transformed in this way. However, in any case it can be deduced that 


RES (ΧἼκω. { IGG) ds (5.10.25) 


In the special case in which « = af 3/3 in (5.10.18), R vanishes also for 
70) = x’ and for f(x) = x°, but does not vanish for f(x) = x*. Hence the 
degree of precision is then 3, and we may obtain a more useful formula by taking 
N = 3, in accordance with which 


(1 + s)* (-l1 Ss --α 
24G(s) = 4s* + 6( — 2a)s? + (1 — 403) (-a<s<a) (5.10.26) 
(1 — s)* (α Ξ 5 Ξ 1) 


where a = J 3/3. It is easily verified that G(s) is continuous and that it vanishes 
only at the ends of the interval, so that the second law of the mean may be 
invoked to give 


1 
R=f"O | GG) ds = Hef 
and hence there follows 


[ F(x) dx =s(- Sy + "(ἢ + τῆς ὦ (5.10.27) 


where |é| < 1.Ὁ 


¢ This remarkable formula is a member of the class of so-called Gauss quadrature 
formulas (to be considered in Sec. 8.5) as well as the class of Chebyshev quadrature 
formulas (to be treated in Sec. 8.14). 
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This example may serve to indicate the use of the influence function in 
other cases. From the definition (5.10.17), it is easily seen that G(s) and its 
first N — 1 derivatives are continuous at the transition points and that they all 
vanish at the end points, x = A and x = B, of the interval of integration in 
(5.10.15). Further, it is found from (5.10.17) that 


(—1)"Gs) = Ϊ ' w(x) ἀχ — > W, (5.10.28) 


5 XK>S 


and 
ΟΝ Ἐ1) 9) = (—1)***w(s) (5.10.29) 


in each subinterval, with the convention that w(x) is to be taken as zero when x 
is outside [a, 8] in both (5.10.28) and (5.10.29). Thus, (—1)%G(s) increases 
abruptly by W, as s increases through the ith abscissa, but is continuous inside 
each subinterval. 

In the usual cases when none of the relevant ordinates correspond to 
abscissas outside (a, δ), so that (A, B) = (a, b), it follows that G and its first 
N — 1 derivatives vanish at the end points of that interval. If, in addition, the 
ordinates at the end points are not involved, so that the formula is of open 
type, it follows that G(s) also vanishes at those points. 

It may be seen that, if G(s) does not change sign in [A, B], the use of the 
second law of the mean shows that (5.10.15) is expressible in the form 


Β[70)] = KAPO) (A < 8 < B) (0610.30) 


where Καὶ is independent of f(x). In particular, if we take f(x) = x‘** there 
follows 


RATT] = (N+ DIK 
Thus K is determined, and, from (5.10.30), we deduce that 


(N+ 1) 
REf(x)] = a RIx"**] (5.10.31) 


if G(s) does not change sign in (A, Β]. 

In illustration, we have seen that the function G(s) associated with (5.10.18) 
does not change sign in the cases when 0 < a < 4 or when α = 1, and that 
then N = 1. Thus, in place of evaluating the integral involved in (5.10.23) 
in those cases, we can use (5.10.31) to obtain the same result more easily: 


” 1 -- 342 
ἈΓΙΟῚ = LO] | xt ax -- αὐ - -a']-* re 


+ For this reason, the presence of a singular derivative of f(x) at an end point tends to 
be somewhat less troublesome for an open formula than for a comparable closed one. 
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However, the initial labor of determining G(s) and actually investigating 
whether or not it changes signin [A, B] may be appreciable when Ν is moderately 
large. The preceding simple example shows that the requirement that the 
weights W, be positive is not sufficient to guarantee that G(s) will be of constant 
sign. 

A third form of the error term, complementing the alternatives (5.10.6) 
and (5.10.15), can be obtained by replacing f(x) by the sum of the polynomial 
yx), which agrees with it at the πὶ + 1 points xo, x,,..., x, involved in the 
integration formula, and the appropriate remainder term (2.6.1), so that we 
write 

70) = Val) + πο λήῦ[χΧο: Χ4:-.., Xp Χ] (5.10.32) 
where, as before, 
MX) = (x — χρ)α — χα) (x — χ (5.10.33) 


If we suppose that the degree of precision of (5.10.1) is at least equal to ἡ, 
as is true for most of the useful formulas, the polynomial y,(x) is annihilated 
by the operator in (5.10.2). Since also the remainder term in (5.10.32) vanishes 
when x = x,, fori = 0, 1,..., , there follows simply 


R[ f(x)] = [ ᾿ γν(χ)π( ) Χο» Χιν».... Xn, x] dx (5.10.34) 


In many cases of interest, there exists a function V(x) such that 


w(x)2(x) = (5.10.35) 


4" a 
where V(x) and its first r — 1 derivatives vanish for both x = a and x = b, 
for some positive integer r. Under this assumption, the result of integrating 
(5.10.34) by parts r times is seen to be 


wags Nagi [aX 


b 
RLF] = (Ὁ | 


and, after making use of (2.3.9) and (3.3.14), combined in the form 


d’ avr 
1: Abe. rere ee a ἘΞ ers Κατ) (5.10.36) 


where ἡ is interior to the interval spanned by the n + 2 arguments on the left, 
there follows 


REF(x)] = - ΞΡ τι [ Vixfe*r* O(n) dx (5.10.37) 
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If also V(x) is of constant sign in [a, b], this result can be further simplified 
to the form 
(-- ip eae cua 1(é) 
R = -------- ᾽ν χ) ἀχ (5.10.38 
τ | ee oe 
where ἔ lies between the smaller of a and χρ and the larger of ὃ and x,. In 
addition, by integrating by parts r times, and again making use of (5.10.35) 
and of the assumed properties of V(x), we find that 


[ V(x) dx -- πτὸ [ [χ + uy 1)]VCx) dx 


= — [x" + u,_ (x) ]w(x)n(x) dx 
r! J, 


where u,_,(x) is an arbitrary polynomial of degree r — 1 or less, which can 
be taken to be identically zero. Hence (5.10.38) is also expressible in the 
equivalent form 

7 1) é) 


RUNS δι 


b 
| x’w(x)n(x) dx (5.10.39) 
where x’ can be replaced by any convenient monic polynomial of degree r (in 
which the coefficient of x’ is unity) if so desired. 
This result will be of particular usefulness in Chap. 8. In the case of the 
formula (5.10.18), it is found that 


w(x)m(x) = x? -- α = Bl [4x3 — a?x + (4 -- αὮ] 
dx 


where the constant of integration is determined so that the function in brackets 
vanishes when x = —1. That function will also vanish whenx = +1ifa’ = 4, 
in which case there follows further 


d? 
w(x)n(x) = = 5 [ξὰ — x’)"] 


so that we may take V(x) = (1 — x”)?/12 in that case. The use of (5.10.38) 
or (5.10.39), with n = 1 and r = 2, leads again to the result given in (5.10.27). 
It may be noticed that if (5.10.38) or (5.10.39) is valid, the degree of precision 
of the relevant integration formula isn + rf. 
In order to express in a different form the conditions permitting the use of 
(5.10.38) or (5.10.39), we may make use of Theorem 12 of Sec. 1.9, to show that, 
if V(x) = w(x)n(x) and if V, V’,..., V°~” vanish at x = a, there follows 


V(x) = ars [ (x -- 5) 'ν(5)π(5) ds (6.10.40) 
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and also the requirements that V, V’,..., V°~» also vanish at x = ὦ take 
the form 


Ϊ ' (b -- s)'w(s)n(s)ds =0 (k =0,1,2,...,r—1) (5.10.41) 


Further, if we assume that the degree of precision of (5.10.1) is n + r, where 
r = 1, it follows that the right-hand member of (5.10.34) will vanish when 
J (x) 1s any polynomial of degree n + r or less, or, equivalently, when the divided _ 
difference f[xo, x,,...,X,, x], of order n + 1, is any polynomial of degree 
r — lor less. But this situation implies the truth of (5.10.41). 

Hence we may deduce that if the degree of precision of the integration 
formula (5.10.1) isn + r, where r = 1, and if the function 


γρὴ) - oa [ “ewe ας 


does not change sign in [α, 8], then the error R is given by (5.10.38) or (5.10.39). 


5.11 Other Representations of Error Terms 


If the degree of precision of (5.10.1) is exactly n, where m + 1 ordinates are 
used, the function V(x) defined by (5.10.35) will not vanish at both ends of the 
interval [a, b] when r = 1, so that (5.10.38) and (5.10.39) then are not valid. 
Whereas the use of the G function of the preceding section generally involves 
the individual consideration of each of the segments [x,, x;,4,], and whereas 
the vanishing of x(x) at each abscissa x, would require the same subdivision of 
[a, δ] before the second law of the mean could be used in connection with 
(5.10.34), it may be possible to define V functions which are appropriate to 
subintervals comprising several such segments, and so to obtain a more useful 
form of the remainder with decreased labor. 

In illustration, a formula approximating the integral of f(x) over [0, 3] 
by a linear combination of the three ordinates at x = 0, 1, 2, with w(x) = 1, 
would possess the error term 


R= [ mx) 70, 1, 2, x] dx mx) = x(x -- 1) χ -- 2 (5.11.1) 
0 


if its degree of precision were at least 2, by (5.10.34). Here we have 
M(x) = χ᾽ — 3x? + 2x = 30.“ — 4x9 + 4x7) = He -- 27’ 


so that the function V(x) = χα — 2)?/4 is appropriate for the subinterval 


218 INTRODUCTION TO NUMERICAL ANALYSIS 


[0, 2]. In the remaining subinterval [2, 3], 7(x) does not change sign. Hence 
we may deduce that © 


R 


Il 


ΗΝ V(x)f[0, 1, 2, x, x] dx + [ mx)f[0, 1, 2, x] dx 
= ait Γ᾿ Vix ) dx .Γ re | n(x) dx 


—sof (61) + ἐγ (ζ}) (5.11.2) 


where both €, and &, lie in (0, 3). 
In other cases, the function Q(x) defined by the relations 


es 


Q'(x) = O(A,) = 9 (5.11.3) 


or, equivalently, 
Q(x) = | WOME) αὶ (5.11.4) 
Ax t— Xk 


where x, is one of the abscissas, may have the property that it does not change 
sign in the subinterval [A,, x;] of [a, b], when A, is suitably chosen. In view 
of the identity 


(0 = δ αδιτον. Mees hae 
-- 7[χο; δ. τρισὶ ἰδ Xk~15 Xe+19°° +9 Xn x] -- 27χο.. eG x (5.11.5) 


where the second term on the right is independent of x, we can write 
Xk 

{ w(x)1(x)f[ X05. ++ Xn» Χ] ax 
Ax 


= [ ΟἿ [[Χο»-.ὉὉὉ Ments Χκα αν τον Xu Χ] — Κ[Χο».-... Χ,]} ax 
= LO(x){ FL Xo: oe ey Xp 1» Χκ 1»... Xn x] " f Xo, eae | Xa} He 
- [ Q(x) f [Xoo «+ +> χε.» Χεααν ++ +> χη XX] dx (5.11.6) 


after an integration by parts. Now Q(x) vanishes when x = A,, and its coefficient 
in the integrated term vanishes when x = x,. Since also Q(x) is assumed not to 
change sign in [4 x,], the second law of the mean is applicable to the second 
term, and there follows 


τῷ 


|. w(x)n(x)f[x9,---> Xm» Χ] dx = -- ἘΠῚ 


| ™ Ox) dx (5.11.7) 
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Also, if we notice that [ Q(x) dx = | Q(x) d(x — x;,), and integrate by parts, 
there follows 


“ ρρὴ dx = [Ge -- x)O)FE - [ (x -- x)Q'(x) dx 


-[ γν(χ)π(χ) dx 
Ak 


so that (5.11.7) becomes 


[ For τὶ ξ) Xk 
w(x)n(x)f[Xo,..-5 Χ,» X] dx = w(x)n(x) dx (5.11.8) 
ye (n + 1)! Jy, 

Thus, in spite of the fact that x(x) may change sign in [A,, x,], it follows 
that the result of formally applying the law of the mean to the left-hand member 
of (5.11.8), and then using (5.10.36), with r = 0, yields a correct result when 
the function Q(x) defined by (5.11.3) or (5.11.4) does not change sign in [A,, χα]. 

We may notice also that if, instead, Q(x) does not change sign between 
x = A, and x = B,, and if O(B,) = 0, there follows also 


| “ γ(χ)π( ) [[Χο: .-.--, Xn χΧ] dx = ΓΘ ᾿ w(x)a(x) dx (5.11.9) 


by a slight modification of the same argument. 
As a first example, we notice that the error term relevant to the Newton- 
Cotes four-point formula of closed type with h = 1 


[τ dx = ξ[7(0) + 3f(1) + 3f(2) + (3)] + R (5.11.10) 


is of the form 
R= [ m(x)f[0, 1, 2, 3, x} dx mx) = x(x — 1)(x — 2) — 3). (6.11.1. 
0 


Here the use of the function V(x) is found to be inappropriate. However, we 
find that 


MOO = χα -- 29°] 
x — 3 


so that the function Q(x) = x?(x — 2)/4, corresponding to the choice A, = 0 
in (5.11.3), is nonnegative forO < x < 3 (as well as for all other real values of x). 
Hence (5.11.8) applies, with 4, = 0 = a and x, = 3 = ὃ, and it yields 


R -oP |. mx) dx = —ssf(E) (5.11.12) 


in accordance with (3.5.12). 
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As a second example, we consider the Newton-Cotes two-point formula 
of open type with ἢ = 1 


Ϊ * ¢(x) dx = ΞΧῸ + fQ] +R (511.13) 


for which we may write 


3 
R= Ϊ π(χ) [{1,2, x] dx mx) = (x -- 1)χ — 22 (6.11.14) 
0 
Again the use of V(x) is inappropriate. However, we have 


PO) x 1 = ὼαἴχα -- DY 

x -- 2 
corresponding to the choice x, = 2, A, = 0 in (5.11.3), so that (5.11.8) applies 
over [0, 2]. Since x(x) does not change sign in [2, 3], we may write 


R= re ι n(x) dx + ce [ πί(χ) dx 


= 7 (ζ + ἐσύ (ζ}) 


and, since the numerical coefficients are of the same sign, we may combine 
the terms in the form 


R= ΣζΖ (ἢ (5.11.15) 
in accordance with (3.5.18).f 
The V and Q methods, when applicable, are usually considerably more 
convenient than the more general G method of Sec. 5.10, which generally 
entails the determination and analysis of n or more distinct functions [each a 
polynomial.of degree N + 1 if w(x) = 1] when n + 1 ordinates are involved. 
However, it must be noticed that the V and Q methods are not applicable in 
those cases when the degree of precision of the integration formula is less than ἢ. 
Formulas which involve values of certain derivatives of f(x) as well as 
the value of f(x) itself, at certain points, may be considered as limits of formulas 
in which r + 1 abscissas coalesce into a single abscissa, corresponding to which 
the values of f, f’,..., and f are used. Thus, for example, if the coefficients 
Wo, W,, W2, and C, are determined in such a way that the formula 


[ ἢ w(x) f(x) dx » Wof(—1) + W,f0) + Wf) + Cf) (5.11.16) 


+ The same methods apply, in particular, to all Newton-Cotes formulas which employ 
an even number of ordinates, whereas the V method succeeds when an odd number 
of ordinates is used. The methods are based on analyses given by Steffensen in those © 
cases. 
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is exact for f(x) = 1, x, x”, and x3, and so for any polynomial of degree 3 or 
less, the error term will be of the form 


R= [ να) + 1)x?(x -- 1)f[—-1, 0,0, 1, x] dx (5.11.17) 


Here the second law of the mean applies directly and gives the simpler result 


R -£@ w(x)x7(x? — 1) dx (5.11.18) 


which yields 
R= —f'(6) (5.11.19) 
in the special case w(x) = 1. 
However, for the formula 


| [ f(x) dx = Wof(-1) + ΚΟ) + Wf) + CofA) (5.11.20) 


with the weighting coefficients determined by the same requirements, there 
follows 


R= [ (x + 1)x(x -- 1)’f[ -1, 0, 1,1, x] dx (5.11.21) 


and, since here x(x) changes sign at x = 0, another approach is needed. Since 
also the function {*, x(t) dt does not vanish when x = 1, the V method 
fails. On the other hand, since | 


AO) = κοὐ — 1) = A? - 0} 
x—1 
the function Q(x) = (x? — 1)’/4 is appropriate with 4, = —1, x, = 1, and 
Eq. (5.11.8) gives 


R | mx) dx = —sgfi(E) (5.11.22) 


The fact that (5.11.19) and (5.11.22) are both identical with the error 
term relevant to Simpson’s rule (for which Wy = W, = 1, W, = 4, and 
C, = 0 or C, = 0) suggests that both (5.11.16) and (5.11.20) will reduce to 
Simpson’s rule in the case w(x) = 1, when the weights are determined in such a 
way that the degree of precision is at least 3, that is, that the weights C, and C, 
will be required to vanish. A direct derivation will confirm this suspicion. 

The direct derivation of the error formula relevant to Simpson’s rule 
itself, over [—1, 1], is effected most easily by the V method since here 


R= [. mx)f[—1, 0, 1, x] dx 
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where 
n(x) = x(x? — 1) = 4[(x? — 1°] = γ΄) 
Thus there follows 


R 


πε (x? = 1)’f[-1, 0, 1, x; x] dx = — ror (x? ly dx 
᾿ 4-4! 


—vof (6) 


5.12 Supplementary References 


The use of symbolic methods essentially dates from Boole [1970 (1860)]. See 
also Michel [1946], Bickley [1948], and Steffensen [1950]. Useful tables of the 
coefficients in many finite-difference integration and differentiation formulas, 
with nonoperational derivations, are given in Singer [1964]. 

For Comrie’s method of bridging differences in subtabulation, see Hartree 
[1958]. 

The polynomials and numbers of Bernoulli, Euler, and Stirling are treated 
in Fort [1948], which also lists many other sources. 

Techniques for accelerating the convergence of series or sequences are 
developed by Bickley and Miller [1936], Szasz [1950], Cherry [1950], Rosser 
[1951], Shanks [1955], Wynn [1956], and Hamming [1962]. Series with known 
sums are listed by Jolley [1961] and by Mangulis [1965]. 

General expressions for remainder (error) terms are given by Peano 
[1913, 1914], Rémés [1940], and Sard [1948a]. See also Birkhoff [1906], 
Radon [1935], von Mises [1936], Daniell [1940], Householder [1953], and 
Kuntzmann [1959]. 


PROBLEMS 
Section 5.2 
1 Obtain the formal relations 


Vv scars 
w= (El? 4 εἢ-- - = ———— = V1 + 162 
wW1+A 2wW1-V 4 


and construct a table expressing each of the operators E, A, V, ὃ, and 4 similarly 
in terms of each of the operators E, A, V, and ὃ. 
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2 Establish the relations 
A = EV V=E-!A E-1/2A = E!/V = ὃ AV = VA=A-V= & 
μὸπ ΔΈΝ) EP? = y+45 pw? = 14+ 16? 


3 Let E,, E,, Δ.» Δ,» and so forth, designate operators which affect only the variable 
indicated by the subscript, with uniform spacings 4 and k implied in the x and y 
directions, respectively, so that, for example, 62 foo = fio — 2fo,0 + f-1.0 
where ὅν, ΞΞ (Χο + rh, Yo + sk). By writing 

fis = EXE} fo,0 
and referring to the interpolation formulas of Newton, Stirling, Bessel, and Everett, 
deduce that a variety of two-dimensional interpolation formulas can be obtained 


by substituting one of the following indicated expansions for each operator, and 
truncating the result: 


E? = 1+ pA+ a ee 


P(p + 1) v2 
2! 


1+ pV + ΞΕ ΡΣ 


»Ξ 
ores i ea 


E + (p — 0.5)6 + PP δ + =| E}/2 


« ~~ ΡΞ P= Dey...) 


εκ ε πότον 6.7. 


3! 


Which pairs of expansions would be appropriate for interpolation near corners 
of a table? Near the borders? At interior points? 


4 By using the Newton forward-difference expansion in both directions in Prob. 3 
and retaining only differences through the first in each direction, deduce the 
approximate formula 


fis ¥ A + rAd + sd))fo,0 
= (1 -- 70 -- s)foo +r -- )fio + σα -- γ),. + shi 


and show that this formula would yield exact results if f(x, y) were of the form 
A + Bx + Cy + Dxy. Also obtain the formula which neglects the mixed second 
difference A,A,fo,o, show that it would yield exact results if f were of the form 
A + Bx + Cy, and specialize both formulas when r = s = 2. 
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5 By using the Everett expansion in both directions in Prob. 3 and neglecting 


differences and mixed differences of order greater than 3, deduce the approximate 
formula 


fx (-na-s) [ - 12-5 - os ΓΕ 


— 2 — 
+ γ( — 5) ! —= "τ" δ - Ca 95] fio 


_ ut ae 
+ (1 — r)s ! ἐν τί Ξ  Τ) πο οὐ — δὲ fo 


Show also that it would yield exact results for 
I(x, y) = A+ Bx + Boy + Cx + ΓΟ, ΧΡ + C3y" + D,x? 
+ D,x?2y + D3xy? + Dyy? + E,x3y + Ε,χγ" 


and specialize the formula when r = s = ξ. 

6 A table includes the following ordinates and differences, together with a state- 
ment that differences of order 4 or greater are negligible. Use the formula of 
Prob. 5 to interpolate for £(6.55, 1.05) and for £(6.524, 1.042). 


y = 1.0 y=1.1 
x f(x, y) o2f δ3 Fx, y) 62f 63f 
6.5 0.9989623 — 168 --3,1 0.9989783 --171 — 28 
6.6 0.9990866 — 147 —28 0.9991026 — 150 —26 


Section 5.3 


7 Express each of the operators E, A, V, ὃ, μ, and uo in terms of AD. 

8 Express the operator ἢ 19 in terms of E, A, V, ὃ, and AD. 

9 Show that the interpolation formulas of Stirling, Bessel, and Everett can be 
obtained operationally by rewriting the relation E* = e**D in the forms 


ES = cosh shD + ee 
cosh 4hD 

BS = Etta — (Cosh AD | ion rap) ΕΠ2 
cosh $4D 


and ᾿ 
E* = sinh σ΄ ἢ... _ sinh (1 — s)hD 
| sinh AD sinh AD 


respectively, and expanding the right-hand members in powers of ὃ = 2 sinh 4D 
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by using the results of Probs. 35 and 36 of Chap. 4 with a replaced by AD and 
x by s or 1. Why would the corresponding expansion of the simpler relation 


ES = e™D — cosh shD + sinh shD 


be of limited usefulness? 

10 From the following rounded values of the function f (x) = sin x, calculate 
approximate values of f(x) and f”(x) at each tabular point and compare the 
results with rounded true values: 


x 0.5 0.7 0.9 1.1 1.3 1.5 1.7 


f(x) 0.47943 0.64422 0.78333 0.89121 0.96356 0.99749 0.99166 


Section 5.4 
11 If, is defined by (5.4.4), show that c ; can be determined recursively by use of the 
formula | 
Cc, = fce;_, — 4e;_. + --- + (—1)**! Crp tees 
J 2 j-1 4 j-2 (—1) k+l? k 
with co = 1 and c_; = 6.) =+-:=0. [Clear fractions in (5.4.4), replace 


log (1 + A) by its expansion, and equate coefficients. ] 

12 Using the data of Prob. 10, calculate the approximate value of f 0.5 f(t) dt for 
x = 0.7, 0.9, and 1.1, and the approximate value of {1:7 f(t) dt for x = 1.1, 1.3, 
and 1.5. From these results determine approximate values of the integral taken 
over each tabular interval. 


Section 5.5 


13 Using the data of Prob. 10, calculate approximate values of the quantities 


0.7 x 1.7 x 
[ fit) dt dx | I f(t) dt de 
0.5 0.5 1.5 1.5 
14 If F(x) = log tan x, and if F(1) = Ο and F%1) = 1, calculate approximate 
values of F(x) for x = 1.00(0.02)(1.10), using only tabulated five-place values of 
log tan x [= (log 10)(log,9 tan x)] for x = 1. 
15 Show that, if the operator θ is defined by the relation 


Xtrh px 
Xk Xk 


then 


9. -ἴ γ᾿ ,. (1 + A)’ -- 1 -- rlog(1 + Δ) A 2 
D? A? log (1 + A) 
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16 


and determine the first three coefficients in the expansion of the operator @ in 
powers of A, as functions of r. 

Show that the right-hand member of the result of operating on the equal members 
of (5.5.13) by 1 + a,V + a,V? is independent of V* if and only if a, = —1, 
that the result is equivalent to (5.5.12) if also a, = 0, and that a particularly con- 
venient choice is that for which a, = 3, leading to the formula 


Pray — Pe — Peep + Pa-g = 3°01 - Vit Ὁ 2 + OV? 
+ ALV* + bv? Ὁ --)}ὃ}}Ὰ 


(This formula is used in Sec. 6.10.) 


Section 5.6 


17 


18 


19 


Using the data of Prob. 10, calculate approximate values of 


1.1+0.2m 
[ f(x) dx 
1.1-0.2m 
for m = 1, 2, and 3. 
Derive the operational relation 
τυ τ ἢ tanh 44D 
x) dx = hu ——— 
[ p(x) μ AD P1/2 


and obtain the expansion in powers of 6 in the form 


Xoth 
{ te = hee oes SH Ae” + ie 
x 


0 


(Compare with Prob. 18 of Chap. 4.) Show also that it can be expressed in the 
alternative form 


- | Ἢ ΓΟ) dx = Wo + 79) — Pelwohy — ufo) + φποίμδϑ, — 10°F) 
— εἰξὲ (μδ ἢ = HO” fo) Se ee 
and deduce Gauss’ sum formula in the form 
: [Πα = He ae ey el ee  σθον ἢ 
— A(uof, — So) + Ads(HO°s, — μδϑ 0) 
— εὐξέοί μδ΄. a HO” fy) 5 


Use the result of Prob. 18 and the data of Prob. 10 to calculate approximate 
values of the integral [%+°? f(t) dt for x = 0.9 and 1.1. 
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Section 5.7 


20 Subtabulate the data of Prob. 10 for x = 0.50(0.02)0.70 and x = 1.50(0.02)1.70. 


21 


22 


23 


If δ΄ represents the central-difference operator relative to the spacing h’ = ph, 
show that 


δ΄ _ sinh 291} 
ὃ sinh 44D 


and obtain the expansion of the right-hand member in powers of 6 = 2 sinh 4hD 
(see Prob. 36 of Chap. 4, with x = p/2, a = AD, and β = δ), thus deducing the 
relation 


ὃ 1 é6\3 1 δὰ" 
δ' = 2p |- — — (1? — p*) f-) + — (12 -- p23? — 92) (2) . ... 
ab Th °°» (2) Th p~)( ρ΄) : 
Show also that 


wo = sinh phD 
pod sinh AD 


and obtain the expansion of the right-hand member in powers of 6 (see Prob. 35 
of Chap. 4, with x = p, a = AD, and β = δ), thus deducing the relation 


(u0)’ = μδ΄ = p [μι - πα’ — p?)ud® + Τα — p*)(2? -- p?)us> -- =| 


In the case of subtabulation to tenths (p = 75), deduce from the results of Prob. 
21 the formulas 


(μδ)' = 0.1n6 — 0.016545? + 0.00329175ud5 — --- 
6’? = 0.016? — 0.0008256* + --- 

(ud*)’ = 0.00145 — 0.0002475u65 + --- 
6’* = 0,00016* -- . 

(μδ΄ = 0.00001 μδ5 — --- 


when differences of order greater than five are neglected, and use these formulas 
to subtabulate the data of Prob. 10 for x = 0.90(0.02)1.10. 

Suppose that mean values of f(x) are known over each of the subintervals 
(x, — h/2, x, + h/2) (Κ = 0, 1, 2,...), where Xn41 τ X, = h, and that approx- 
imate mean values over subintervals of length 2ph, again centered about the points 
Xx, are required. With the notations 


1 Xe +h/2 1 XxX + ph 
mee [ κά m= 1 Vode 


h X_—h/2 2ph X_~—ph 
derive the operational relation 
, 1 sinh pAD 


bo” wee ae {{|.. 


2p sinh 4hD 
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and deduce the formula 
j 1 1 
mi = [1-5 - δες ἃ - χε τ ot - το] τα 


(See Prob. 36 of Chap. 4, with x = p,a = AD, and β = 0.) In particular, deduce 


the formula 
12 (δ2Ζ 12.32 (δ᾽ 
= [1 --- [-})] + -}) τ ..} πη 
ἘΞ [1-5 (3) + G) - | 


Section 5.8 


24 The Bernoulli polynomial B,(x), of kth degree, is defined as the coefficient of 


u“/k! in the expansion | 
ue™™ πὰ αὖ 
e— 1 >. τ 
y=0 


(a) By differentiating the equal members of this relation, deduce the differential 
recurrence formula 


B(x) = ΚΒι.-.(Χ) (kK = 1,2,...) 
and show also that Bo(x) = 1. 
(b) By making use of the identity 
ς- μ)ε μοί! 3) 


e“—1 e— 1 
prove that 
Bl — x) = (—1)"Bx) 
Also, by integrating the equal members of the defining relation over [0, 1], 
deduce that 


[ B,(x) dx = 0 (k > 0) 
0 


and use this result, together with the recurrence formula of (a), to show that 
Bo(x) = 1 BiQw=x-4 B(x)=x*-x+t+ 3 

B3(x) = x3 — 3x? + 4x 
and so forth. 


(c) In accordance with (5.8.7), the kth Bernoulli number B, is defined by the 
relation B, = B,(0). Show that 


e442 * coth= 
e*“—-1 2 2 2 
is an even function of u, and hence deduce that B, = —4and that Bo,,; = 0 


when m 2 1. 


25 


26 


27 
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(d) Use the identity 


to deduce that 
B,(4) = (2'~* — 1)B, 


Use appropriate results of Prob. 24 to show that Bo m+1(x) vanishes when x = 0, 
4, and 1. Show also that, if it vanishes at any point inside [0, 1] in addition to 
x = 4, then it must vanish at at least two such points. Then deduce that this 
situation is impossible by using Rolle’s theorem to show that its existence would 
imply that By,,,,(x) = (2m + 1)B,,,(x) vanishes at least four times inside [0, 1], 
that Bo, _ (x) vanishes at at least two points inside [0, 1], in addition to x = 4, 
and hence that B,,,_3(x),..., B3(x) have the same property, thus establishing a 
contradiction since B3(x) = x(x — 4)(x — 1). Show further that the function 
Bom+2(%) = Boms2(x) — Bam42 Vanishes at the ends of the interval [0, 1], and 
that its vanishing anywhere inside [0, 1] would contradict the preceding result. 
Hence deduce that the function Byms2(x) = Bom+2(x) — Bym+2 vanishes at x = 0 
and at x = 1, is of constant sign in [0, 1], and takes on its extreme value in that 
interval at x = 3. 

Use successive integrations by parts to show that 


[ " [Bam+2(6) -- Βρνι.21 O"+2%s) ἀν 
= [{Bon+2(s) -- Boms2}F2"* (5) — Boms2(S)F NG Ss 
+ Bom o(s)F Ὁ) —B Smits (s)F(s)\ + {, Bym+2 (8)F(s) ds 
Then by using results of Prob. 24, deduce the formula 


[Ε{(} + F(O)] = [ F(s) ds + : an [Ἐ21-1} — ἘΠῚ 1,0] + E 
0 i= 


where 


E= - oa |, [Bom+a2(s) — Bom+2\FC™*?(s)ds 


By summing the results of increasing the argument of F successively by 0, 1, 
2,..., andr — 1, in Prob. 26, obtain the formula 


> F(k) = i " F(s) ds + 4[F(O) + F(r)] 
= 0 


> api FAM) — FAME] + Bale 


ἘΠ ξὶ ε. om oan f [Β2...2(6) -- Boma] >, FOM+ IIS ἢ 94 
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28 


29 


30 


Show that the error term in Prob. 27 can be written in the form 


Fmt 2)(σ) 


Ex) = —? Oy oy! 


1 

B 5 
Β.,,...,Δ(56) -- B ds = r 333 Ὲ2 (νι Ὲ2)(σ) 
[ [ 2m+2 om+2] | (2m =: 2)! 
for some o such that 0 < o < r,if F°?"*” is continuous in that interval, and also 
that, if F°"+(s) does not change sign for 0 < s <r, the error term can be 
expressed in the form 


Boms2(N) — Bam+2 τ’ 
E --- ~2m+2 ~— “2m+2 Femty) k 1) — Fem) k 
m(T) (am #2)! 2, [ (k + 1) (k)] 


ps Bom+2(N) — Bom+2 2m+1)y(p) _ F2m+1(G 
ΣΝ [ΡΤ Ὁ (0)] 


for some 7 such that 0 < ἡ < 1. Further, use the results of Probs. 25 and 24d 
to show that this term is numerically smaller than twice the first term neglected 
in the expansion of Prob. 27 and is of the same sign. [Notice that this expansion 
is reduced to that of (5.8.13) if F(s) is identified with f(xo + As), with the sub- 
stitution x9 + hs = x.] 

Show that 


[ p(x) dx — h(po + Di Ὁ + Pr) 


xo 


AA [a - EDS — α - EM] 


0 


h-*J --Ἰ "Ὁ -- Ε 
h | ———-_ Po τ ——___ -ΡῺ 
1-—E 1-E 


and, by expressing the operator affecting pp in terms of A and that affecting p, in 
terms ΟΥ̓͂Ν, deduce the Gregory summation formula in the operational form 


1 (* 
al p(x) dx = (po + py + ++ + Dr-a + Pr) 


xo 


_g@A)-1, — &#-V)—-1 
A Po (—V ) — PD, 
where 
“ ᾿ - 


-....- c,u* 
log(1 + 4) K=0 


gu) = 


with the notation of (5.4.4). 
Use the data given in Prob. 32 of Chap. 3 to obtain approximate values of the 


integral 
2 ‘ent! 2 dt 
wm Jo 


by means of the Euler-Maclaurin and Gregory formulas. 


31 


32 


33 
34 
35 
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By replacing the derivatives at x9 and at x, in (5.8.17) by combinations of mean 
central differences using formulas such as 
hf’ = pof — ἐμδῆ + soud°f — --- h°f” = po°f — 4pd°f + --- 


deduce Gauss’ sum formula 


[foe- Ιαρ ἐλεεῖ τα 632 


- ape 11} 53, -- pdf) -- --- 
τ Wah -- μδϊ0} + (ube -- μδ) 


[See also Prob. 18. When the series is truncated, with differences of order 2k — 1, 
the error term is the result of replacing the contents of the parentheses in the 
first omitted term by rh?**+3f@*+2)(£) where & is between the extreme relevant 
values of x. (See Steffensen [1950].)] 
Approximate the value of the integral 


Yes 96516353 
_41+ x? 
by use of the Euler-Maclaurin and Gregory formulas with r = 8, investigating 
the effects of successive corrections through those of third order and retaining six 
decimal places. (See also Sec. 3.9.) 
Use Gauss’ formula (Prob. 31) in Prob. 32. 
Use the second Euler-Maclaurin formula (5.8.15) in Prob. 32. 
Express the first two pairs of end corrections in the second Euler-Maclaurin 
formula (5.8.15) in terms of forward and backward differences, as in Gregory’s 
formula, and use the result in Prob. 32. 


Section 5.9 


36 Show that if the first N — 1 terms of the series 


wv 


SS ee tie sare aes 
22 


— = 1.644934 
32 n2 6 


are summed directly, and if the Euler-Maclaurin sum formula is used to approx- 
imate the remainder, there follows 


Sm lit ἐδ τ. : | 


23. .42 (Ν -- 1)? 
ΤΙ 1 1 1 1 Bam 
+ l— + + eet ee tt -ξπε [Ὁ ΣΝ 
. 2Ν2 6Ν5. 30N5 42Ν17 ἘΣ οὖν 


Then determine N and m in such a way that the number of terms to be retained is 
minimized, assuming successively that approximations which round correctly to 
5, 10, and 20 decimal places are required. 
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37 Suppose that neither Σὺ F(«) nor f 9 F(s) ds necessarily converges as r -Ὁ 00, but 
that their difference tends to a limit C, so that 


C = lim Ρ F(k) -- [ Fo 
r>0o [k=0 0 


and that F(s) and all its derivatives tend to zero as s - οὐ. Show that the Euler- 
Maclaurin expansion of Prob. 27 then can be written in the form 


ς ΠΝ - Boi peai-1) 
2, F(k) [ F(s) ds + C+ 4F(r) + pa ai)! F (r) + Er) 
where 
E,,Ar) = EL, Ar) = E,(00) 
1 co 
= πε τη [ [Bom+2(S) -- Bom+2] ΙΣ, Femt es + »| ds 


and also obtain results analogous to those of Prob. 28 in this case. Show further 
that 


= πο But μῶι-1) 
C = 3Ε(0) δ, ἘΠῚ F (0) + E,,(00) 
where 


_ Bom+2() — Bam+2 μ(ῶπι 1) 
E,() ao FEm*Y0Q) (0 «ἡ <1) 


if F°2"+2)(s) does not change sign for 0 < s < ©. 
38 Use the result of Prob. 37 to deduce the asymptotic expansion 


(ake! ae eee Oe Se Bom 
2 3 n 2n 12n (2m)n2™ 
where 


c= tim (St τ torn 
nao \¢ k 


=1 


assuming the existence of this limit. Also show that 


ele) 1 μ.υ... Bee 
2 12 120 2m 
where 
Bom+2() — Bom+2 
Fr ast i ae ee O<ne< il 
δὲ 2m + 2 ( : ) 


and that E,, is of the same sign as the first neglected term and is less than twice as 
large. Finally, determine the best approximation to C obtainable from this 
expansion and determine C to five places by equating the two members of the 
former expansion when x = 10. (The constant C involved here is known as 
Euler’s constant and is known to round to 0.5772156649.) 


39 


40 


4] 


42 
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Use an appropriate modification of the result of Prob. 37 to deduce the asymptotic 
expansion 
log n! = log1 + log2 +--- + logn 
1 1 
=(n+4)logn+ K-n+— -—- — 
( 2) log 12n 360n3 
Bom a 
2m(2m — 1)n?"-1 
where 
K = lim [log n! — (n + 4) logn + 7] 


n—-> © 


assuming the existence of this limit, and show that 


Peis ye ον δὼ τ Ὁ 
12 360 2m(2m — 1) 


where E,, is of the same sign as the first neglected term and is less than twice as 
large. Also, calculate an approximate value of K from this expansion, and 
determine K to five places by setting n = 10 in the former one and using the 
fact that log 10! = 15.104412. The true value of K is known to be 4 log 2x = 
0.91894. leas this fact, deduce Stirling’s asymptotic formula for the factorial, 
in the form 


= Vann ntem* (1 Ἐπ: + aed 


12n 288n? 


Apply the Gregory formula to the approximate summation of the series 


> ; 
3 
a (2k + 1) 


to five places, after summing an appropriate number of terms in advance. 
Use each of the formulas (5.9.7) and (5.9.8) to sum the series 


ΣᾺ 
ke + 


to five places, after summing an appropriate number of terms in advance. 
Determine the Euler sum of each of the following divergent series: 


q@i-1l+1-—1+4---4(-1)' τ 
()1-24+3-44---4 (-1)""1n 4+... 
()1-—-24+4-—84.---+4 (—1)2" 4+ 


Also verify that the three series can be obtained formally by setting x = 1 in the 
power-series expansions of (1 + x)~1, (1 + x)~?, and (1 + 2x)~1, respectively, 
and that the Euler sum in each case is the value taken on by the generating function 
when x = 1. 
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43 


44 


45 


46 


Derive (5.9.12) by means of n iterations of the transformation considered in 
Prob. 8 of Chap. 1. 

Show that if N terms of the original series in (5.9.12) are summed in advance and 
if r terms of the second sum in the right-hand member are retained, there follows 


= N-1 eo: 
2 (— 1%. = 2h + (-1)* 3 > i=) Atf, 


ok 
k=0 
(-1)""" Γ--1 
+e > ODEN yin | + E 
k=0 


where 
(= ἢ trtet 1. .59 


Ε εν Σ (--1}} Δ fers 
ΚΞὸ 


Thus, noting that E depends on N + r, but noton Norr separately, deduce that 
the same result is obtained (1) by summing N terms of the original series in 
advance, then applying (5.9.12) to the remainder, with a chosen value of n, and re- 
taining r terms of the second sum and (2) by the process described by replacing 
N by N — mandrbyr +m,when-rimesN. Also verify this fact numer- 
ically in the case of the series 


S=1-4+4-4+-:: 


taking first N = 2, = 4, andr = 2, and then N = 3,1 = 4, and r = 1, and 
showing that both calculations yield the same result as does the retention of five 
terms in the transformed series in (5.9.11) (with N = 4, n = 4, and r= 0). 
(Thus this procedure is useful for remedying a situation in which it is found that 
an insufficient number N of terms was summed initially to make the Euler trans- 
formation effective, without sacrificing the calculation already completed.) 
Use the formula (5.9.14) to sum the Taylor series 


τῶ x* 1 
> (-D* — = — log (1 + x) (x <1 
k=0 k+1 x 


to five decimal places when x = 4. 
Derive (5.9.15) by writing 


S τούϑεσι = 3 CDi + Ale 


k=0 


Ι! 
raw 
! 
iMs 
~_“-, 
| 
—. 
~ 
ie) 
ra 
1 Ι 
iM 
κε. 
ὃν 
͵᾿Ὄ 
μα 
.. 
© 
Φ 


and formally interchanging the order of summation. 
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47 (a) Show that 
a* sin θ 
a” + 2at cos 0 + 12 
by taking the imaginary part of εἶθ > (— εἶθ}. 
(ὁ) Deduce from (5.9.15) the formal summation formula 


ree) k 
Σ (εἰς sin (k + 1)0 = (It! < [al) 


"ἡ a ok sin (k + 1)0 = σοῦᾷ, 8) + =e U,(A, 8) 
k=0 


+ X80 YC, 0) + 


where U(t, 0) = (a sin 0)/(a” + 2at cos θ + t”). (The parameter a can be 
taken to be 1 if the series on the left then converges.) 

48 (a) Show that the Cesaro transformation is not very effective for the sum 
S=1-434+41-—--:-, but that repeated application of the Hutton trans- 
formation to the partial sums So, S,,..., Sg yields a five-place approximation 
to S. 

(ὁ) Verify that the Cesdro and Hutton sums of the divergent series 1 — 1 + 
1 — --- + (—1)" + --- are equal to the Euler sum (Prob. 42a). 


Section 5.10 


49 Show that (x — s)", is a continuous function of x and s if > 0, and that 


[ (x —s)i. dx _ & - 9 ; (n # —1) 
n+1 ee ς 


Lae — si = n(x — 5) 
ax 
50 Derive (5.10.17) from (5.10.16) and, under the assumption that the degree of 
precision of (5.10.1) is at least N, show also that G(s) vanishes when s is outside 
(A, B), where A and B are the smallest and largest of x9, x1,..., Xn, a, and ὃ. 
51 Obtain the influence function G(s) for which 


[ F(x) dx = 4[F(—1) + 4F() + Ε()] + [ G(s) ΕἾ 6) ds 
-1 


in the form 
G(s) = —7( — [ε)Ὅὁα + 3)s]) = ((s| S 1 
and show that 


1 
| ΘΟ) ΕΝ) ds = —bF™Q (μ| < 1) 
-1 


Also, by writing x = (¢ -- xg — A)/h and F(x) = f(t), deduce Simpson’s rule in 
the form (3.5.11). 
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52 


53 


54 


Apply integration by parts to the result of Prob. 51, to show that the error in 
Simpson’s rule, as applied to F(x) over the interval [—1, 1], can be expressed in 
the alternative forms 


ee Ϊ "(1 -- |s)3C1 + 3.0 5 0) as 
a | 


I 
fea) 


--1 


- [ s(l -- |s|)?F’'(s) ds 


1 
al ( — |s)Q — 3|s)F'(s) ds 
- 


and deduce that when the rule is applied to f(x) over an interval [xo, x9 + 211, 
there follows 
h? h* 8h3 
ΚΙ s — M. ΚΙ Ξ — Μ. R| s — M. 
[ΚΙ ore δὶ τ [ΑΙ 2 


δ1 


where Μ is the maximum value of |f“(x)| on [x9, Xo + 21] under the assump- 
tion that f(x) exists and is integrable over that interval. 

Determine Wo, W,, and W,, as functions of a, in such a way that the error term 
in the formula 


| F(x) dx = WoF(—a) + W,FO)+W2F@)+R (0<a81) 
-1 


vanishes when F(x) is an arbitrary polynomial of degree 3 or less, showing that 
the resultant formula is of the form 


| F(x) dx = = [F(—a) + 2(3a2 — 1)F(0) + Γ()] +R 
oe | a 


and that its degree of precision is 3 unless ἃ = 3 and is 5 in that case. Also, 
show that the influence function corresponding to N = 3 is given by 
1 
(1 — |s\)* -- a — |s\|)° s| Sa 
ge) = {πα - bD*- Bae τ bb? sl se) 
zz(1 — |s|)* (a Ξ |s| S$ 1) 


(Compare with Prob. 58.) 
Show that the function G(s) obtained in Prob. 53 does not change sign in [—1, 1] 
when a = 4, and deduce the formula 


{ F(x) de = 3[2F(—4) — FOO) + 2FG)] + τῇσι ὦ 
= 


where |é| < 1. Also transform this result to the Newton-Cotes three-point formula 
(3.5.20), of open type. 


55 


56 


57 


58 
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Show that the degree of precision of the formula 
1 

Ϊ F(x) dx = «ς 1Ε) + 16F(0) + ΤΕ(-1)]} -- s[F'(1) -- F(-1)] + R 
—1 


is 5, obtain the influence function relative to N = 5 in the form 


G(s) = τέσσ( — {s\)*(1 + 4|5] + 552) 
and deduce that 


R= σήσξε ε  «([{] < 1) 


Also, generalize this result by writing x = (t — x9 — A)/h and F(x) = f(t). 
Show that the degree of precision of the formula 


F(1) — 2F(0) + F(—1) = ἐδ) + 10F’0) + F(-1)] + R 


is 5, and that R can be expressed in the form 
1 

R = τὲς | (1 — |s|)°(3s? — 6|s| — 2)F"(s) ds = -zhgF"%E) (lel < 1) 
-1 

Assuming that x» S x S x,, obtain g(x, 5) such that 


f(x) = αὐ τὰ - xq) ov τ Lo) 4 [ Ban Gases 


in the form 


hg(x, 5) = & — χρ)αι - x) (x) Ss Sx) 
—(% — χρ)α; — 5) (x Ss x) 


and deduce the more familiar form of the error term 


R = 16 — xox — xf") (Xo < € < x;) 


Show that the error term relevant to the formula of Prob. 53 can be written in 
the form 


1 
R= Ϊ V(x) F[—a, 0, α, x] dx 
= 


where V(x) = 4[(x? — a)? — (1 — α2)2], and deduce that 


3 — 5a? 
180 


R= FM) (Ié| < 1) 


when 0 < a? < 4ora? = 1. Also show that this result reduces to the results of 
Probs. 51 and 54 when « = 1 and 4, respectively, and, by determining « such 
that the weighting coefficients are equal, deduce the additional formula 


| " FG) de = 3 |F(- 2) + FO) + -(2)] + stoF™© 
-1 
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59 Determine ἃ such that the error term relevant to the formula of Prob. 53 can be 
written in the form 


1 
R= Ϊ μ"(.)Ὲ|--- αο, 0, α, x] dx 
=4 


where V, V’, and V” vanish for x = +1, show that then V(x) is nonpositive in 
{—1, 1], and deduce the formula 


1 - = 
Ϊ F(x) de = ξ{5Ἐ{-- V3) + 8F0) + 5Ε(Μ9)] + ho" a «ἢ 
1 


Section 5.11 


60 By specializing the Newton-Cotes four-point formula of open type to the interval 
{—2, 3] with A = 1, in the form 


[ F(x) dx = 4[11F(—1) + FO) + FQ) + 11F()] 
-2 
+ [ nm(x)F[—1, 0, 1,2, x] dx 
2 


where (x) = x(x? — 1)(x — 2), and considering the function 


Q(x) = © Oat 
-2 


t—2 


show that the error can be expressed in the form 
iv 2 iv 3 ᾿ 
E= ee | n(x) dx + ee Ϊ n(x) dx = 25,F*(2) 
ὁ --2 ° 2 


where €,, €, and & are in the interval (— 2, 3). 
61 Determine W,, W,, and W; such that the formula 


Ϊ ΕΟ dx = W,F(O) + WyF(1) + WFQ) Ὁ R 
0 


possesses a degree of precision of at least 2, and show that the resultant formula 
takes the form 


[ xF(x) dx = Ξ3[2Ε() + ΕΩ)] - ῷἢ (<6 < 2) 


0 


62 Derive the formula 


{ nO) ie = [F(-1) + 2F0) + FO) - Zo FM@) (ἐ «Ὁ 


-1V1— x? 
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63 Show that the error R in Prob. 55 can be written in the form 


-1 


1 
R= | x?(1 — x?)?F[—1, —1, 0, 0, 1, 1, x] dx 


and that this form leads again to the result 


R = ats F“"() (ἐξ < 1) 


64 Show that the error R relevant to the Newton-Cotes five-point formula of closed 
type, as applied to F(x) over [—2, 2], can be expressed in the form 


2 
R= | V'(x)F[-2, —1, 0, 1, 2, x] dx 
—2 
where 


V(x) = [ t(t? — 1)(t? -- 4) at 
ΞΡ: 


Show also that V(x) is an even function, so that V(—x) = V(x), and that V(2) = 0. 
Show further that V(x) increases to a positive maximum value as x increases from 
--2 to —1, that it then decreases steadily as x increases from —1 to 0, and that 


V(O) = V(-1) + [ t(t? -- 1)(t? -- 4) dt 


-1 


V(—1) -- [- aad [((2 — 1)(t? — 4)] dt 
2 2—-t 


II 


(1-574) νου (-2<" < -1) 
2-1 


Hence deduce that V(0) is positive, that V(x) does not change sign in [—2, 2], 
and therefore that 


R= Fé) 2 


A χα — 1)(x? -- 4) dx = - σε Ὁ —(l€| < 2) 
: -2 


(A similar analysis, due to Steffensen [1950], applies to all Newton-Cotes formulas, 
of closed type, employing an odd number of ordinates.) 

65 For the Newton-Cotes six-point formula of closed type, as applied to F(x) over 
[—2, 3], show that the function V(x) of Prob. 64 serves as an appropriate Q 
function over [—2, 2], so that the error R can be expressed in the form 


R= ee [ x(x? — τ) — 4)(x -- 3) dx 
: -2 


x Pie) “π΄ - 1) -- 4) -- 3) dx 


2 


= —gasFME,) -- εὐξξσ ΒΥ ({2) = -- τοξξε Ε (-2 «ξ < 3) 


6 


NUMERICAL SOLUTION OF 
DIFFERENTIAL EQUATIONS 


6.1 Introduction 


Many techniques are available for the approximate solution of ordinary 
differential equations, or of sets of such equations, by numerical methods. 
This chapter presents a selection of frequently used procedures of various types 
and illustrates their application. In addition, an indication is given of the 
troublesome problem of error propagation in stepwise integration processes, 
and overall error bounds are obtained in illustrative cases. 

Some comments relative to the problem of selecting an appropriate 
technique are included in the concluding section (Sec. 6.17). Whereas most of 
the treatments deal with initial-value problems, brief considerations of boundary- 
value problems (Sec. 6.15) and characteristic-value problems (Sec. 6.16) are 
also included. 
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6.2 Formulas of Open Type 


We consider first the problem in which it is desired to obtain a numerical 
approximate solution of the first-order equation 


ay = F(x, y) (6.2.1) 
dx 


which takes on a prescribed value yy when x = Xo, 
Y(%o) = Yo (06.2.2) 


Starting with the known ordinate, it is proposed to calculate successively the 
ordinates 


Vr = γύο + A) = W(X), Yo = γ(ο + 2h) = y(%), ...., 
Jn = W(X a nh) = WXn)s cies (6.2.3) 


where ἢ is a suitably chosen spacing. 
For this purpose, we may, in particular, make use of the relation 


Xnth 


Seo ae [ y(x)dx (624) 


Xn 


Suppose that the ordinates y,, y,-,,..., 1, and yo are known. Then the 
corresponding values of y’(x) are calculable from the formula 


Ve = γχρ = F(X, γὼ (6.2.5) 


If we approximate y’(x) by the polynomial of degree N which takes on the 

calculated values at the N + 1 points x,, x,-;,..., and x,y, by making use 

of the Newton backward-difference formula (4.3.8), 

s(s + 1) 
2! 


Ynts © γι, Ἔ 50), + Vy bere 


N! 
where 
Χ-Χ 
5 ΞΞΞ cei 6.2.7 
; (6.2.7) 


we may use this polynomial to extrapolate y'(x) forward over the interval 
[x,, x, + A], for the purpose of approximately effecting the integration in- 
dicated in (6.2.4). 

The result of this calculation is 


1 N 
peg gee { Yiagds wy, thd aV'y 6.2.8) 
0 k=0 
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where 


1 cue τ 
a, = | FO ett ds (6.2.9) 
0 . 


the leading terms of (6.2.8) being of the form 
Ynnt © Ya +A + ἅν + ὧν + ἐν + FOV? ἘΥΡΕν τ τ». (6.2.10) 


in accordance with (5.4.13). | 

The error term corresponding to truncation with the Nth difference o 
y, is given by ἢ times the integral of the right-hand member of (4.3.9) with 
f = γ΄, in the form 


1 ban 

E = δ Ὁ: s(s + IG Ὁ Ν) yN+2)(€) ds 
: (N + 1)! 

or, since the coefficient of y+?) does not change sign in [0, 1], in the form 


E = ay+ ayo) (6.2.11) 


where xX,4, > € > X,-n- Thus, for example, if only third differences are 
retained, the error is given by 334h°y"(€) where x,41 > CS Αι 
More generally, we may use (6.2.6) in the relation 


1 
Yar = Ian ἘΚ γονάς (6212) 
—?p 
where p is any positive integer, to express the ordinate following the mth one 
in terms of the ordinate calculated p steps previously and in terms of, say, 
N + 1 already calculated values of y’. The formulas most frequently used, in 
addition to (6.2.10) with p = 0, are those for which p = 1, 3, and 5, the leading 
terms of which are of the form 
Veer © Yani ἜΜ + OV + 15 + 4V° + SV" Ὁ d4vi +---)y, (6.2.13) 
Vnn1 © Vn-3 + W(4 — ΑΚ + Ἧ ξν + OV? + 14y+ 1ῴ4γν 5 4--:)y, (6.2.14) 
and 
Vout © Vans + (6 — 12V + 1507 — 9V° + 33y4 + OV° +°::)y, (6.2.15) 


Whereas the error associated with terminating one of these formulas 
with the Nth difference can be expressed in the form 


νὰ [᾿ 56 + IG + N) even) 
E=h | ee y%+22) ds (6.2.16) 


where € now depends upon s and lies between x,+1 and the smaller of x,_» 
and x,_y, the fact that the coefficient of y%*+2) changes sign in the integration 


—P 
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range when p > Ὁ makes it impossible to apply the law of the mean directly 
in order to obtain a simple form similar to (6.2.11). Somewhat more complicated 
forms are obtainable by subdividing the range of integration and applying 
the law of the mean to each subinterval, or, better, by using one of the methods 
of Secs. 5.10 and 5.11. 

The formulas for which p is an odd integer are of particular interest because 
of the fact that, in each such formula, the coefficient of the pth difference is 
found to be zero. In these cases, the retention of p — 1 differences thus affords 
the same accuracy as the retention of p differences. Indeed, the cases in which 
N = p correspond to the use of Newton-Cotes formulas of open type, employ- 
ing an odd number of ordinates, in the integration indicated in (6.2.12). Further, 
the error terms in those cases can be expressed in a form similar to (6.2.11) 
and are given for p = 3 and p = 5 in Eggs. (3.5.20) and (3.5.22). Thus, in 
particular, we have the special formulas 


Ynt1 = Yn-1 + 2hy, + Ἐ γνῷ (6.2.17) 
)κει = Yn-3 + 4h(y, — Von + ξν γὼ) + γι (6.2.18) 
and 
Ynt1 = Vas + Oh(y, — 2Vy, + ZV’ y, — νῦν, + OV, 
+ = γί (6.2.19) 


where, in each case, € lies between the largest and smallest of the arguments 
involved in that formula. These formulas, and corresponding ones for p = 
7,9,..., have the property that, in each case, the retention of differences 
through the Nth leads to a formula with “‘accuracy of order N + 2,” that is, 
to an error term proportional to h%**, whereas for the other formulas of the 
type considered here the accuracy corresponding to the retention of differences 
through the Nth is of order N + 1.7 

It is clear that, since a formula employing Nth differences depends upon 
knowledge of N + 1 successive values of γι, and since initially only yp is known, 
such a formula cannot be used until N additional ordinates have been deter- 
mined by another method. Before illustrating the use of such formulas, it is 
desirable to consider a class of related formulas. 


t It is seen that the terminology here is also such that a formula with “‘accuracy of 
order m” would yield exact results if the required solution y(x) were a polynomial of 
degree m or less. When y(x) is not such a function, it is not necessarily true that an 
increase in m corresponds to an improvement in the approximation afforded, as was 
seen in Sec. 3.9. 
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6.3 Formulas of Closed Type 


The formulas derived in the preceding section express y,+, in terms only of 
previously calculated ordinates and slopes. A set of similar formulas which 
involve also the unknown slope y;,,, is obtained by replacing the right-hand 
member of (6.2.6) by the interpolation polynomial agreeing with y’(x) at 


Xn+19 %no ++ +s Xn—-N+1° 


(s — 1)5 v2 
2! 
a (s — 1)s(s + ne +N “-- 2) oN 


Ynts © Yaar + (5 — LY) γα + Vago 


γε (6.3.1) 


where s is again defined by (6.2.7). If this approximation is introduced into 
(6.2.12), the results in the cases p = 0, 1, and 3 are obtained in the forms 


γαῖ © Vn τ A -- τὴν -- Ὅν - «1.ν3 


— FyeV* -- τον" -- τ»... (6.3.2) 
Vater © Va-1 + A --2ν τ dy? ἃ ov3 
— ¥5V* -- dV? — + yng (6.3.3) 
and 
Vati © Vn-3 + WA -- 8V + 2092 _ 8y3 
+ 44V* - OV? —-+)yna (6.3.4) 


The error associated with retaining only Nth differences in a formula 
relating y,,, and y,—», can be expressed in the form 


ΟΠ N+2 1 (s — 1)γ5σ(5 + 1S ἘΝ -- 1) wea) 
E=h Ι. <=“ (ae ΠῚ γ (ξ) ds (6.3.5) 


where & lies between x, , and the smaller of x,-, and χ,- ν. When p = 0, 
the law of the mean can be used, as in the preceding section, to show that the 
error is expressible in the form (6.2.11), where ay+ is the numerical coefficient 
in the first neglected term. In the cases for which p is an odd integer, it is found 
that retention of p + 1 differences is equivalent to the retention of p+2 
differences and that the use of these special formulas corresponds to the use of 
Newton-Cotes formulas of closed type, employing an odd number of ordinates, 
for which the error terms are obtainable from Sec. 3.5. Thus, when p = 1 and 
p = 3, we have the special formulas 


͵ / μ᾽ Vv 
Ynt+i = Yn-1 τ 2hCYn+1 — και + AV Yao) = 90 y(G) (6.3.6) 
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and 
Ynt1 = Yn—-3 + 4WCYng1 — 2VVna, + av" Vea = BV vias 


+ soV4¥i4) — yt 63.7 
945 
for which the retention of Nth differences yields an accuracy of order N + 2, 
whereas the other formulas of the type considered generally yield (N + 1)th- 
order accuracy. . 

Formulas of the sort derived in this section are said to be of closed type, 
since the expressions for the required ordinate y,,,, at the point x,,, involve 
the unknown slope y,,, at that point, whereas those of the preceding section 
involve only known slopes at preceding points and are accordingly said to be of 
open type. A comparison of corresponding formulas employing a like number 
of differences shows that the error terms associated with formulas of closed type 
possess smaller numerical coefficients. However, since the unknown Va+1 
is involved (explicitly and implicitly) in both members of formulas of closed 
type, it would appear that this advantage must be weighed against the fact that, 
unless y’ = F(x, y) is a linear function of y, the equation relevant to such a 
formula generally must be solved for y,,, by iterative methods. (This point is 
considered further in Sec. 6.6.) 


6.4 Start of Solution 


Except for the special formula 


f h? tA . 


obtained by omitting all differences in (6.2.10), and for simple closed formulas 
obtained from (6.3.2) by retaining not more than one difference, each of the 
formulas obtained in the preceding sections can be applied only after the calcula- 
tion of a number of ordinates y,, y,,..., y,, in addition to the prescribed 
ordinate yo, where r is the number of differences retained in an open formula 
and is one less than that number in a closed formula. 

One method of starting the solution of the problem 


d 
= F(x, y) γί) = vo (6.4.2) 
dx 
consists of determining the coefficients of a finite Taylor expansion 
f 2," 
Ye = WOK + hs) = yo + POs Τό, cae 


r,,(r) r+i.(r+1) 
+ h Yo s sh h y (ὦ siti (6.4.3) 
r! (r + 1)! 
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where y = (d*y/dx*),.,., and χρ < ξ < Xo + hs, by successively differen- 
tiating the basic differential equation or otherwise, under the assumption 
that a representation of this type exists when s is sufficiently small. Thus, 
recalling that d/dx = 0/@x + γ' d/dy, we obtain the relations 
y = F(x, y) 
γ᾽ = P(x, y) + YFG y) 
γ' = F,,(x, y) + 2y'Fay(X, y) + VY? Fy VY) + VFO, Y) 
and so forth, and hence there follows 
Yo = F(Xo, Yo) γύ = Fil(Xo» Yo) + YoFy(Xo: Yo) (6.4.4) 
and so forth. 
Whereas these general expressions become quite involved as the order of 


the required derivative increases, they are not actually needed in practice. In 
order to illustrate this fact, we consider the specific example 


qy_.2_y y0)=1 (64.5) 


for which the exact solution is readily found to be 
y=2—2x+x*-e™* (6.4.6) 

From the given equation, we obtain successively 
yox=-y, y= ῶχ -- γ, γ Ξ2 - , 

yY¥ = -γῦ, r= --γ", ... (6.4.7) 
and hence, with x) = 0, there follows 
y=1, y= -ἴ, Y= γὅ Ξ 1, yo= --1, γὺ ΞΊ, 
Thus, if we take h = 75, Eq. (6.4.3) gives 


h 
5 1Γ.Ὰ ifs 1/s\* 1 fs» 
Bs Mam Coed ἘΠ᾽ Στ 
10 (ὦ) At, 24 (5) 120 (6 


and, with s = 1, 2, and 3, we obtain 


y, = 0.90516 y2 = 0.82127 y3 = 0.74918 (6.4.9) 


(6.4.8) 


to five places. Since the successive terms of (6.4.8) alternate in sign from the 
fourth term onward and decrease steadily in magnitude, the error due to trunca- 
tion is smaller in magnitude than the first neglected term and is of the same sign. 
Additional ordinates could be obtained to this accuracy by retaining sufficiently 
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many terms in the expansion. Alternatively, a new expansion could be launched 
from the point x3, in the form 
hy; μ΄ γ3 52 


Y34+s = J3 Ten 5 


+ se. 


with y3 known and y3, y3,... calculable in terms of y, from (6.4.7). 

It is obvious that the linear example chosen illustrates a particularly 
simple case, because of the simplicity of the relations (6.4.7) and because of the 
fact that (6.4.8) is an alternating series and hence is amenable to a precise 
truncation-error analysis. More usually, the relations (6.4.7) are replaced by 
successive equations which increase fairly rapidly in complexity, so that it is 
usually desirable to abandon this procedure in favor of a more convenient 
one when sufficiently many starting values have been obtained.f 

Discussion of the existence of (6.4.3) in the general case of the problem 
(6.4.2), as well as consideration of other types of representations which can be 
used when (6.4.3) cannot, must be omitted here (see Sec. 6.18). In some cases 
it is preferable to determine the coefficients in an assumed expansion of the form 


ox) = Σ A(x — xo) 


by inserting that expansion in the differential equation and obtaining a recur- 
rence formula to be satisfied by the A’s. 

A method similar to the preceding one, which has the advantage that the 
order of the highest derivative required is about half that needed in (6.4.3), 
but has the disadvantage that each forward step involves an iterative process, is 
treated in Sec. 6.12. 

Mention should also be made of Picard’s method, in which the problem 
(6.4.2) is first transformed into the integral equation 


W(x) = yo + [ ” F(x, γα) dx 


and successive functions approximating y(x) near x = Xo are generated by 
the iteration | 


γῆ 0) = yo + [ F(x, y“\(x)) dx (6.4.10) 


The initial approximation y'°!(x) is conveniently taken to be the constant Yo 


} For adaptations of the “‘Taylor-series method” to computers, see, for example, 
references given in Lapidus and Seinfeld [1971], p. 81. 
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or the linear function yy + yo(x — Xo), Where yo is determined from the 
differential equation. 
Thus, in the preceding example, we would write 


yHrx) = 1 + [: [x2 -- γι) dx (6.4.11) 
0 


and, with y°(x) = 1, there would then follow 
yo) =1oxt43 γῆρ) =1—x Ὁ bx?  1χ - ax* (64.12) 


and so forth. The accuracy afforded by a member of the sequence of approx- 
imations at a certain number of points x,, x,,... could be estimated by 
comparing calculated values at those points with values calculated from the 
preceding approximation, or by use of appropriate analytical methods. 

While Picard’s method is of great theoretical importance, the explicit 
evaluation of the integral in (6.4.10) is often impracticable in cases which are 
less simple than the preceding one. Thus, for the problem y’ = cos (x + y), 
y(0) = 1, the first iteration with yx) = 1 gives yx) = 1 -- sin] + 
sin (x + 1), and the second iteration would involve the evaluation of the form 


yl4x) = 1 + { cos [1 — sinl + x + sin(x + 1)] dx 
0 


Also, when F(x, y) is not given analytically, neither this procedure nor the 
Taylor-series method is directly applicable. 

A frequently used class of procedures consists of evaluating the integral 
of y’ = F(x, y) in (6.4.10) approximately by use of numerical methods. Thus, 
in particular, if y’ is approximated by the Newton forward-difference polynomial- 
interpolation formula, the results of the integration are obtained by replacing 
p(x) by y’(x) in (5.4.8), and we have the formulas 


γι = γο + {1 + 4A — ὦσλ + aed? — 725 Δ΄ + °° JV 
Yo = γο + {2 + 2A + 4A? + OA® — goA* + °°] V0 
γε = Yo + {3 + ZA + ZA? + BA® — ἐδ + °°" JV 
Yo + A[4 + 8A + 232A? + 8A3 4 4A4 4 +++ yh 


(6.4.13) 


Ya 


and so forth. Here yo is given, and if, say, y1, ¥2, ¥3, and γὼ are estimated, 
the corresponding values of yo, ..., ¥4 can be calculated from the differential 
equation and introduced into the right-hand members to give the new approx- 
imations to y,,..., 4, after which the process may be iterated. 

In the case of (6.4.5) we may notice that, since the value yo = --ἰ 15 
obtained from the differential equation, the initial approximation yx) = 
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1 — x, is appropriate. With h = 0.1, the following initial array then may be 
formed when three additional ordinates are incorporated: 


x y y’ Ay’ A?y’ Asy’ 
0.0 1.00000 — 1.00000 

11000 
0.1 0.90000 — 0.89000 2000 

13000 0 
0.2 0.80000 — 0.76000 2000 

15000 
0.3 0.70000 — 0.61000 | 


The use of the first three of Eqs. (6.4.13), retaining third differences, leads to 
the new array 


0.0 1.00000 — 1.00000 
10467 
0.1 0.90533 — 0.89533 799 
11266 — 198 
0.2 0.82267 — 0.78267 601 
11867 
0.3 0.75400 — 0.66400 


Three additional iterations yield results which are unchanged to five places by 
further iteration and which are correct to those places. The correctness of those 
values would be checked, in practice, by considering the effects of the neglected 
fourth differences, sample values of which become available as the calculation 
is advanced from this stage. 

A useful variation of this procedure consists of using central differences 
and of determining ordinates on both sides of the initial point x). Thus, by 
appropriate integration of the Stirling interpolation formula, we obtain the 
relations 


Y-2 = Yo + hA[—2 + 2μδ — $5? + 1τμδ -- δ΄ - --.7γ0 
Y-1 = Yo + A[-1 + 4yd — $6? — syud® + τὴςδ᾽ - --.7γ0 
Yr = Yo + ALi + 4yud + 367 — «ἐἐμδ — τ οδό +--+] 

Yo = Yo + [2 + 2nd + 36° + 4d? + ¥ed* + +++] yh 


(6.4.14) 


Since here the calculated ordinates are taken as close to the given one as is 
possible, the convergence of the iterative process is generally more rapid than that 
associated with the use of (6.4.13) unless the solution displays unfavorable 
characteristics to the left of the point χρ.ΐ Since truncation with a mean odd 
difference would not correspond to true collocation, it is desirable to use a 
symmetrical array of abscissas. 


} The calculation of ordinates on both sides of the starting point is also frequently 
convenient when use is made of Taylor series. 
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In the case of the preceding example, we may start with the array 


x y γ' oy’ δ2γ' δ᾽'γ΄ dty’ 
--Ο.2 1.20000 — 1.16000 
7000 
—0.1 1.10000 — 1.09000 2000 
9000 0 
0.0 1.00000 — 1.00000 (10000) 2000 (0) 0 
11000 0 
0.1 0.90000 — 0.89000 2000 
13000 
0.2 0.80000 — 0.76000 


when four additional ordinates are used. The mean odd central differences 
are entered in parentheses. After four iterations, using (6.4.14), we obtain the 
array 


—0.2 1.21860 — 1.17860 
8377 
—0.1 1.10483 — 1.09483 1106 
9483 — 105 
0.0 1.00000 — 1.00000 1001 9 
10484 — 96 
0.1 0.90516 — 0.89516 905 
11389 
0.2 0.82127 — 0.78127 


which is unchanged, to the five places retained, by further iteration. Thus 
five-place values of y_2, ¥_1, Yo. ¥1, and yz are now available for the advancing 
calculation. Here the fact that the effect of the fourth differences is negligible 
supplies fair evidence that sufficiently many ordinates were used. 

It is important to notice that the formulas of (6.4.13) or (6.4.14) can also 
be expressed explicitly in terms of the slopes γι, if the use of differences iS 
undesirable, once the number of slopes to be retained has been decided (the 
corresponding five-slope formulas are given in Milne [1970]). 

Another class of self-starting methods, which are also useful when F(x, y) 
is not defined analytically, but which are noniterative, is treated in Secs. 6.13 
and 6.14. 


6.5 Methods Based on Open-type Formulas 


Once at least N additional ordinates, say, y1, Y2,---» Yn, are determined, the 
calculation may be continued by use of one of the formulas, derived in Sec. 
6.2 or 6.3, which involves Nth differences. In the case of the example (6.4.5), 
with the calculated data (6.4.9), the preliminary tabulation may be arranged 
as in Table 6.1 where, for compactness, the backward difference Vy! is written 
in the same line as the entry y,. 
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Table 6.1 
x y γ' vy’ v2y’ vey’ vty’ 
0.0 1.00000 — 1.00000 
0.1 0.90516 — 0.89516 10484 
0.2 0.82127 — 0.78127 11389 905 
0.3 0.74918 — 0.65918 12209 820 —85 


In particular, the Adams (or Adams-Bashforth) method (Bashforth and 
Adams [1883]) uses formula (6.2.10), truncated to a suitable number of terms, 
for advancing the calculation. (The simplest such procedure, in which no 
differences are retained, is often known as Euler’s method.) Thus, if third 
differences are retained, the Adams method next yields 


γε © 0.74918 + {0 {--0.65918 + 4(0.12209) + 55,(0.00820) — 3(0.00085)] 
= 0.68968 


after which an additional line 
0.4 | 0.68968 | — 0.52968 | 12950 741 —79 6 (6.5.1) 


is entered for the purpose of advancing to y;. If again only third differences are 
retained, the next line appears as follows: 


0.5 | 0.64347 | --Ο.39347 | 13621 671 -- 9 (6.5.2) 


The fourth difference is carried along as a partial check column. Since 

the truncation error in each step is of the form 

22 πη γ 

for some ξ, and since h*y‘(€) is given by V*y’(n), for some ἡ, the two available 
sample values of V*y’ indicate that h*y” probably does not vary strongly over 
the relevant range, so that a fairly dependable estimate of the truncation error 
committed in each of the steps can be obtained by calculating the contribution 
251h V*y, of the first neglected difference. With ἡ = 0.1, this contribution 
will amount to less than one-half unit in the fifth place if V*y, does not exceed 
14 units in that place. 

If use is made instead of formula (6.2.18), in which only second differences 
are retained, the same results are obtained. Here the error estimate again 
depends upon the fourth difference, the factor 14 = 0.31 replacing the factor 
231 + 0.35 relevant to the Adams formula with third differences. Thus, as 
compared with the Adams method, this method here possesses the advantage 
that one less difference is needed in the calculation (but not in the error check) 
and that the coefficients in the formula are somewhat simpler. 
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It should be emphasized that the errors so far considered are those which 
would arise in a single step from x, to X,+, if Vo, Vis--->¥_ Were exactly 
correct and if no roundoff errors were introduced in that step. In addition, 
however, one must consider the cumulative effect of the errors introduced in 
preceding steps. Whereas consideration of the propagation of errors is post- 
poned to Secs. 6.7 and 6.8, it may be remarked here that the advantage in 
stability generally lies with the Adams method. This situation is related to the 
fact that the ordinates themselves are “loosely coupled” by (6.2.18), in that the 
ordinate y, is linked directly only with ordinates of the form y,_4;, where i is an 
integer, whereas in (6.2.10) all ordinates are directly linked together. 


6.6 Methods Based on Closed-type Formulas. 
Prediction-Correction Methods 


A possible method of employing one of the formulas of Sec. 6.3 to calculate 
Yaz 1 consists of first estimating y,+,, calculating Yai, = F(%n+1. Yn+1) COFr- 
responding to this estimate, forming the requisite corresponding differences 
V«yi.4, and then calculating an improved estimate of y,,, by use of the formula. 
The cycle then is to be repeated, if necessary, until two successive estimates 
agree within the prescribed tolerance, assuming convergence of this iterative 
process. The initial estimate, say, y{?,, may be obtained by use of a formula of 
open type. 

In illustration, returning to the example considered in the preceding 
section, line (6.5.1) can be considered as the result of using the Adams method, 
with third differences, as a predictor. If now the data in this line are used in 
(6.3.2), truncated also with third differences, the first correction γι) is given by 


y = 0.74918 + [0.52968 — 4(0.12950) — 4,(0.00741) + 3,(0.00079)] 
= 0,68968 


which agrees with the initial prediction to the five places retained, so that 
iteration is not needed. 

This specific process of prediction and successive correction is considered 
now in somewhat more detail. For this purpose, we here denote by Y,,, the 
true ordinate at x,4, and by y,4, the approximation which would be afforded 
by the chosen truncation of (6.3.2) if the resultant equation could be solved 
exactly. Further, we again denote by γί), the initial prediction yielded by the 
corresponding truncation of (6.2.10) and by y\?1, ¥421,..., the results of the 
successive iterations, or corrections, just described. 

In order to simplify the analysis, we suppose that the errors in all the 


previously calculated ordinates yo, ¥1,..-,¥, and slopes yo, Vis ks Dy 816 
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negligible. If only third differences are retained in (6.2.10) and (6.3.2), there 
then follows 


0 
αν 


y, + = ὅδ; — 59γ!., + 37y!_, —9y'_3) 6.1) 
and also 


h / / / ? 
κει = Yn + 54 Ova 1 19y,, r3 Vasa ἘΝ γ,...2) (6.6.2) 


where y,,, and y,,, are related by the equation 
Yost = ον) (6.6.3) 
In addition, the true ordinate Y,,, satisfies the equations 
h , t f ’ Vv 
Yu+1 = Ya + 54 Ne — 59Yn—1 + 37γ..) — 9Vn—3) + F30h?y"(E1) (6.6.4) 
and 


h / , / / Ὺ 
Υ ει = Ya + 54 0 tat + 19y, = 5)».-.1 1 Yn=2) — ΣΟΙ (C2) (6.6.5) 


where both €, and €, lie between x,_, and x,,,, and where 
γι = F(Xn415 Yaga) (6.6.6) 


Finally, we have 
esa = Yn Ties ar = (9F ns yo) cg 19y,, τος 5} 6--Ἰ ΒΗ Yn-2] (6.6.7) 


Accordingly, there follows from (6.6.1) and (6.6.4) 
Yaar — Woes = F5bhy'(E,) (6.6.8) 
and, from (6.6.2) and (6.6.5), 


3h 
Ynt1 — nti = ry LF (Xn415 144) > F (Xn+15 Yn+1) i ἩΣ 61" y(C2) (6.6.9) 


Assuming the existence of F, = OF /dy in the relevant region of the xy plane, 
the mean-value theorem gives 


F(Xn+19 Yas) — F(Xn+15 nt = (Yat. - Yat Fy (%n415 Nn+1) (6.6.10) 


for some y, 4, between y,,, and Y,44. 
If now it is assumed that ἢ is sufficiently small to ensure that 


3h 
8 ΙΕ (χα 15 ἤ,..1}} « | (6.6.11) 


Φ 
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so that the first term on the right in (6.6.9) is relatively negligible, and also that 
γ᾽ ΑἹ does not vary strongly for x,_,; < x < X,4,;, so that y(¢,) and y"(€) 
can be equated to a first approximation, Eqs. (6.6.8) and (6.6.9) lead to the 
useful approximate relation 


Yet. — ne ® Sj oN (Yaar — Wes 
or 
Yat. — Yar. Ὁ B70 Vn+1 (6.6.12) 
where 
Ynt1 = Vnoi — »έφι (6.6.13) 


Thus, if a column of the differences y, = y, — y{° is carried along 
in the calculation, the error in the final iterate y,,, which is due to truncation 
error in the step from x, to x, can be estimated as —19y,, ,/270 + —y,4 4/14. 
The reliability of this estimate depends upon the validity of the assumption that 
errors propagated from preceding calculations are negligible and upon the 
smallness of the magnitudes of AF, and hy" in the relevant region. In this 
connection, it may be noted that if the first neglected difference V*);,4, does 
not vary strongly with k, it can be expected that the same is true of y’, so that 
hy” probably is small relative to y’. 

The magnitude of AF, near (x,41, Yn+i) turns out to be the governing 
factor relative to the convergence of the iteration leading from γί), to Yrs4 
since, from (6.6.3) and (6.6.7), there follows 


eee ᾿ 
Yn+1 — Vo = 8 ΓΕ(Χ,. 1. »,...1) -- Ε(Χ,..1. ye] 


3h i i 
= Ε Fy (Xn4 1 i) (Yn+1 — yy) 


Hence, if |F,| Ξ K,+1 near (%+41, Yn+1), we have 


i 3h i 
lat. — yl Ξ π Kav il¥nes = yO | 


and accordingly 


; 3h : 
lYnt1 — yl s (ξ Keyes] γον Ξ γί! 
Thus convergence is ensured if 


= Kes <1 (6.6.14) 
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The rate of convergence is specified by the ratio of the magnitudes of the 
errors in successive iterates, and this ratio here is approximated by the absolute 
value of the convergence factor p,.4 such that 

3h 
Pr = gf? F νί χη» Yn) (6.6.15) 
In the case of the example in which 
y=x—-y y0)=1 
we would notice in advance that, near the beginning of the calculation, 


F, = -2y x --2 


y 


Thus the convergence factor in the early steps would be about —3h/4. The 
choice h = 0.1 then would appear to be acceptable since each iterate then would 
tend to deviate from the limiting value y,,, by about one-thirteenth of the 
deviation possessed by the preceding iterate. 

Usually, in practice, the spacing ἢ is taken to be so small that in fact πὸ 
iterations beyond the first correction y‘'), are needed, so that y%!), itself is an 
acceptable approximation to Y,,,. In this connection, the following considera- 
tion is useful. 

If, temporarily, we assume that y‘(x) > 0 for x,3 < x < χει, we 
conclude from (6.6.8) that 

γα > Vari (6.6.16) 


Also, from (6.6.9) it follows that, at least when ἢ is sufficiently small to ensure 
iteration convergence, we have 
γε < Yn+1 (6.6.17) 
and, accordingly, 
γε. > Yaoi > γὼ, (6.6.18) 


Next, by comparing (6.6.5) with (6.6.7) when i = 0, we obtain the relation 


3h ; ν 
a Vi = i Py(Xn41, mes) (Yht1 Ξ yo) = ΣΟΙ (C1) 


and hence, since the first term on the right is small of order A°, it follows that, 
at least for sufficiently small values of ἢ, we have 


Yu41<Vne1 (6.6.19) 
Finally, by comparing ae with (6.6.7), we obtain 


Yn+1 — yee Shae = [FCs Vara) oF Kees ye DI 


3 " 
on ra Fy(Xn4 1: Mn+ >| ("κεἰ Ν ys (6.6.20) 
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Thus, if F, > 0 near (X,41, Yn+1), it follows that (6.6.18) and (6.6.20) 
imply the desired relation 


Yn+1 > Var Soe yo 4 (6.6.21) 


which states, in particular, that the first correction γί, affords a better approx- 
imation to Y,,, than does the limit of an infinite sequence of iterates,} the 
difference being small of order ἢ" when ἢ: is small. In the alternative situation 
when γ᾽) < Oforx,-3 < X < X41, all inequalities are reversed and the same 
conclusion follows. 

Accordingly, unless it is established that F, < 0 aa (Xe Vong): 
iteration is as likely to worsen the approximation afforded by γί, as to improve 
it, on the average. It is probably best to calculate y,,, as γι — yuri, to 
continue the calculation from point to point as long as, say, |y,41| 15 smaller 
than seven units in the last place retained (so that the additional truncation error 
introduced per step does not dominate the current roundoff-error contributions) 
and to decrease the spacing A when this situation no longer holds. 

The preceding analysis generalizes immediately to include any predictor- 
corrector process in which the predictor has a truncation error of the form 
c,hy(E,) and the corrector has one of the form —c,h?y(€,), where the order 
p is the same in both error terms and where c, and 6) are positive. Then the 
factor 272 in (6.6.12) is to be replaced by C = (cy + C2)/c2, and the factor 3 in 
(6.6.11), (6.6.14), and (6.6.15) is to be replaced by the coefficient «_, of AYn+1 
in the corrector formula. It is easily seen that ha_, is the algebraic sum of the 
coefficients of all backward differences retained (including the zeroth) when a 
truncation of one of the formulas in Sec. 6.3 is used. 

In particular, if (6.2.10) is used as a predictor for (6.3.2), the factor 
C = 21° = 14 corresponding to retention of third differences is to be replaced 
by 2 for no differences, 6 for first differences, 10 for second differences, 7’ ~ 14 
for third differences, and 5°? ~ 18 when fourth differences are retained. That 
is, if the difference Between the prediction and the corrected value does not 
exceed half the value listed, in units of the last place retained, then the truncation 
error in each step probably does not exceed one-half unit in that place. 

Some computers treat —7,4,/C as a modification, to be added to the 
calculated y,,, in each step, while others prefer merely to control the magnitude 
of this quantity and to use it only to provide warning or assurance with respect 
to stepwise truncation errors. Likewise, since under the preceding assumptions 
the approximate error in the prediction yo, is +(C — 1y,+1/C, some computers 
add (C — 1)y,/C to y%;, under the assumption that 7,4; © 7,» to define a 
“‘modified”’ prediction. 


+ The occurrence of such situations was pointed out in a note by D. D. Wall [1956]; 
see also Henrici [1962]. 
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We will refer to the preceding predictor-corrector method as the modified 
Adams method (it is ascribed to Moulton [1926]). The procedure based on 
retaining only the first difference in (6.3.2) is often called the modified Euler 
method. 

Milne’s method differs from the methods just described in that it uses 
(6.3.6) for correction and (6.2.18) for prediction, retaining second differences. 
The truncation error in the nth step can be estimated as —(y,4, — y6,)/29. 
This method possesses the advantage that the truncation error in each step is 
proportional to h°, whereas retention of only second differences in the modified 
Adams method leads to stepwise truncation errors of order h*. On the other 
hand (as will be indicated in Sec. 6.7), it compares unfavorably with the Adams 
method in regard to possible error growth. 

It is obvious that each of the formulas considered could be expressed 
explicitly in terms of the values of the derivative y’, in place of differences of 
those values, once a decision was made as to the number of differences which 
were to be effectively retained. Thus, for example, the Milne second-difference 
procedure can be based on Eqs. (6.2.18) and (6.3.6) or, equivalently, on the 
equations 


4h, , 1485 , 
Ynt+1 = Vn-3 + - Nn — Va-1 + 2y,-2) + re (¢) (6.6.22) 
and 


h / / f h° Vv 
Ynt1 = Yn-1 + 3 Unt + 4y, + Yn-1) - 90 y(¢) (6.6.23) 


where, of course, the values of ἔ in the two equations are generally unequal. 
The second equation is seen to be equivalent to Simpson’s rule. 

This procedure not only avoids the calculation and storage of differences, 
but also possesses an additional advantage in that then only the entries y,., 
and y,,1, are modified in successive steps of any iteration process. However, 
whereas these advantages are of particular significance when large-scale com- 
puting devices are used, so that simplicity in programming and minimization 
of storage requirements are of prime importance, they may compare unfavorably 
in other cases with the advantages which follow from the possibility of con- 
sidering the regularity and the trend of the difference columns. 


6.7 The Special Case F = Ay 
Each of the formulas treated in the preceding sections is expressible in the form 


Yn+1 = Vn-p + AG_1Vn41 + OV, + Veg Hoes t+ Oo Vn—r) (6.7.1) 
Where «_, = 0 for the formulas of open type, and where 


Ve = ΕΚ, γὼ (6.7.2) 


258 INTRODUCTION TO NUMERICAL ANALYSIS 


and, accordingly, is a member of the more extensive class of so-called (p + 1)- 
step formulas. If, in fact, it is the result of retaining r differences in one of the 
formulas considered in Sec. 6.2, or r + 1 differences in one of the formulas in 
Sec. 6.3, then it will reduce to an identity if y(x) is a polynomial of degreer + 2 
or less, when a_, ¥ 0, and of degree r + 1 or less, when a_, = 0. 
In the case when the differential equation is of the very special form 
dy 


— = A 6.7.3 
cP y (6.7.3) 


so that F(x, y) = Ay, where A is a constant, the relation (6.7.1) takes the form 
( " α...4Π}},..1 = ),»--ρ τ Aho Yn Ἔ α.},--ἰ a a ,Yn—r) (6.7.4) 


and is subject to a simple analysis, the results of which are helpful in under- 
standing the propagation of errors in the more general case. It may be noticed 
that the exact solution of (6.7.3), subject to the condition y(%o) = Yo, 15 


y(x) = ye (6.7.5) 


In order to fix ideas, we suppose that r => p and so include most of the commonly 
used formulas , such as formulas (6.2.10) and (6.3.2), for which p = 0, and the 
formulas (6.3.6) and (6.3.7), for which r = p. As will be seen, this restriction is 
easily removed. 

The relation (6.7.4) then affords a linear relation among the r + 2 
ordinates y,+3, Ya Yn-is-++>» and y,-, (one of which is identical with y,_ >») 
and can be considered as a difference equation of orderr + 1, under the assump- 
tion that «_,AA # 1 and a, # 0. It holds only for n 2 Γ, the ordinate yo 
being prescribed, and the remaining r initial ordinates y,, ¥2,---> ), Ssup- 
posedly being supplied by an independent calculation. 

We may notice that y, = f” will satisfy (6.7.4), with B constant, if β 18 
determined such that 


(1 — a_,Ah)prt! = B"-? + Ah(aoB” + αιβ᾽ Ὁ τ΄ + α,βη ἢ) 


or, after removing the common factor β' ", such that β satisfies the characteristic 
equation 


(1 — a, Ah)BPt? — Ah(ooB" + a8" +++ +0,)- Br? =0 (6.7.6) 


Since p and r are nonnegative integers, such that r — p = 0, this relation 
is an algebraic equation of degree r + 1 in B and hence possesses r + 1 roots 
Bos Bi>---+> β,» Which may be real or imaginary. 


+ When p > r, as when one of the open formulas (6.2.17) to (6.2.19) is used (without 
a corrector), the characteristic equation is of degree p + 1 and corresponding minor 
changes must be made in the analysis which follows. 
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If no roots are repeated, then, from the linearity and homogeneity of the 
difference equation (6.7.4), it follows that 
Yn = Cobo + οιβὲ + +++ + οβ (6.7.7) 


satisfies (6.7.4) for arbitrary values of the r + 1 independent constants Co, 
Cy,-.., ὦ, Which are available for satisfying the r + 1 initial conditions which 
prescribe yo, y;,..., y,. It can be shown that (6.7.7) then represents the most 
general solution of (6.7.4), when n is restricted to integral values. 

If B, = B2, the terms c,B{ + c2B3 are to be replaced by B"(c, + cn), 


as is easily verified. Furthermore, if 8, and B, are conjugate complex, so that 
βι = pe* 8B, = pe "Ὁ 


where p = |8,| = |B], we may replace c, and c, by }(c, — ic.) and 4(c, + ic.) 
and rewrite the corresponding two terms in (6.7.7) in the more convenient form 


p"(c, cos nd + cy sin nd) 


It remains to investigate the roots of the characteristic equation (6.7.6). 
We may notice first that, when ἢ = 0, the equation reduces to 


pr-r(p*! — 1) = 0 


so that 6 = 0 is then a root of multiplicity r — p, and the remaining p + 1 
roots are the (p + 1)th roots of unity. In the complex β plane (Fig. 6.1), 
r — p roots coincide at the origin, whereas the remaining p + 1 roots are 


Im() 


C+ ᾿ 


FIGURE 6.1 
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equally spaced about the unit circle [β] = 1, with one root at the point β = 1. 
When ἢ is small, the r + 1 roots will generally be distinct, with r — p roots 
near the origin, and p + 1 roots in the neighborhood of the unit circle. 

In particular, if we denote by βο that root which tends to unity as ἢ tends 
to zero, we may write 


Bo =l+tmh + mh? +-°: (6.7.8) 


where the coefficients m,,m,... are to be determined in such a way that 
the result of replacing β by βο in (6.7.6), and expanding the result in powers of 
h, reduces identically to zero. A simple calculation then shows that the result 
of that substitution is of the form 


Al(p + Im, — AG@-1 + % +m τ "τ" + αὉ}} 
+ h7[---] tee+=Q 


and hence, in particular, that we must have 


m, (α., H% +a, ἘΠ: + 4) (6.7.9) 


pt+il 
But, under the assumption that the integration formula which led to (6.7.1) 
gives exact results when applied to the integration of a constant, that is, for 
y'(x) = 1, we can deduce from (6.7.1) the relation 


a, tata too +ta=ptl (6.7.10) 


and so find that m, = A. Accordingly, one root of (6.7.6) can be expressed in 
the form 


Bo = 1+ Ah + O(h?) (6.7.11) 


where, as always, the symbol O(h7) represents a term which is small, of the order 
of h*, when ἢ is small. 
The corresponding part of the solution (6.7.7) is thus of the form 


Col + AA -ἮἯ 0)" = ὡῇ + AA + ον βρη ΚΟ 


and is approximated by cye4“"~*” when ἡ is small. Thus we see that this part 
of the general solution of the difference equation tends toward the general 
solution of the approximated differential equation as h — 0 and, indeed, tends 
toward the required solution, for which y(xo) = Yo, if Co > Vo as h— 0. 
The remaining r terms in (6.7.7) represent so-called parasitic solutions, 
which correspond to the fact that the order of the difference equation exceeds 
the order of the simulated differential equation by r. For small values of ἢ, 
we have seen that r — p of the roots B, will be small in magnitude, relative to 
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unity, and hence that the corresponding terms B; will tend rapidly to zero as 
the calculation proceeds and n increases. 

However, if p > 0, there are p roots in addition to Bo which are of unit 
absolute value when ἢ = 0. If, for h > 0, any one of these roots, say, B,, 
has a magnitude greater than unity, then (unless the coefficient c, happens to 
vanish) the corresponding term c,$" will increase unboundedly in magnitude 
as n increases. 

In illustration, if use is made of the simplest formula of open type (Euler’s 
formula), 


Ynt1 = Vn Ὁ hy, = (1 + Ah)y, — (6.7.12) 
with p = r = 0, the only root is By = 1 + AA, and hence the solution is 
Yn = Yo(L + Ah)" = yo(1 + AA)Or-*# (6.7.13) 


which does indeed approximate (6.7.5) when ἡ is small. 
For the open formula with p = 0 andr = 1, 


Ynt1 = Vn + ACYn + Vn) = Yn + = Gy, — Yn-1) 
= (1 + $Ah)y, — 4Ahy,_, (6.7.14) 
the characteristic equation (6.7.6) becomes 
B? — (1 + 3A4A)B + 44h = 0 
and yields 
Bo = 4+ 34h +4J1 + Ah + 9422 


and 


| 


1 + Ah + $477? «... 


By = 4 + 34h -- 41 + Ah + 2472? = SARL — Ah +) 
Thus, for small ἢ, the solution is of the form 
7, --Ξ Co(1 + Ah + ee +n X0)/ ΞΕ c,(4Ah ΕΞ ae Gn —%0)/h (6.7.15) 


If co and c, are determined such that y, = ὄρβο + οι βὲ reduces to yo and γι 
ἴογ n = 0 and 1, respectively, there follows 


C= Yi — Bio ΓΞ βο)ο -- V1 (6.7.16) 
Bo — By Bo — By 


The ordinate y, is assumed to be supplied by another method. If we assume 
that y, differs from the true value yye4” = y.(1 + Ah +>: -), at worst, by an 
amount of order h, it is easily seen that c, differs from Yo and c, from zero by 
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an amount at worst of order ἢ. Hence here the parasitic solution is small when 
h is small, and also it tends to zero as n > oo for any fixed value of h which is 
sufficiently small to make [βι] < 1.7 

As an example in which p > 0, we may notice that if the Simpson’s 
rule formula (6.6.23) is used in the form 


Ah 
γε = Ya-1 + ΕΞ (»,..1 Ὁ 4ν, + »μ»-) (6.7.17) 


AN\ gp, 44, Ah\ _ 
(- 3) τ᾿ (+4) 


with roots expressible in the forms 


Eq. (6.7.6) becomes 


Bp = 1+ Ἀπ": B, = --1 - Η 34. -ὄ τ’ 
when h is small. Thus the solution of (6.7.17) is expressible in the form 
γ, = Col + 4ι.-- Ὁ) οι 4 (= 1" (1 — FAA Ὁ τ) ἫΝ 
RM CoeAn—¥0) 4 (—1)"c,e7 A/Gn— 20) (6.7.18) 


when ἡ is small. 

When A is positive, so that the exact solution grows exponentially with 
x, the root B, lies inside the circle |B| = 1, and the parasitic solution accordingly 
damps out exponentially in magnitude, as the calculation proceeds with in- 
creasing m. However, when A is negative, so that the exact solution tends 
exponentially to zero as x increases, the parasitic solution increases exponentially 
in magnitude and, in addition, alternates in sign from step to step in an advanc- 
ing calculation. To a first approximation, c, is found to be half the difference 
between the value of y, used in the calculation and the true value ype". How- 
ever, the value which should be assigned to y, in order that c, vanish exactly 
is not the true value, but a value which tends to the true value as h --- 0. That is, 
the parasitic solution would be present even though y, and y, were exactly 
correct. It is important to notice also that each roundoff committed at any 
stage of the advancing calculation will initiate a new parasitic solution, of the 
same type. 

In the somewhat more general case when the differential equation is of the 
form y’ = Ay + Bx + C, where A, B, and C are constants, a linear function 
of x accordingly is added to the exponential term present in the true solution 
when B = C = 0. Itis found that the same modification occurs in the solution 
of the approximating difference equation, so that a linear function of n is 


+If A = 0, the requirement [βι] < 1 is satisfied for all A. However, if A < 0, the 
spacing ἡ must be such that ἡ < 1} | A]. 
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merely added to the right-hand member of (6.7.7). Thus the same parasitic 
solutions are present, and the preceding discussion again applies, except for the 
fact that here the true solution will not decrease in magnitude as x increases 
and when A is negative, but will grow linearly, while the parasitic solutions may 
grow exponentially. 

Finally, in the general case, when we are concerned with an equation of 
the form y’ = F(x, y), we may imagine that F(x, y) is replaced by the linear 
approximation 


F(x, Y) & Fn γὼ τ (% -- ΧΕ, Vn) + (VY — VF (nv Yn) 


in the neighborhood of a point (x,, y,), and so imagine that the differential 
equation is replaced by the linear equation y’ = A,y + B,x + C,, where 


A, τι ᾿ἴχ Yn) 8, = FF AX,; Vn) 
Cn = FOX Yn) — χ, Ἐς, Vn) — Yak y%ns Yn) 


It is then plausible (but not always true) that the nature of the error propagated 
in the numerical solution of the true equation will be simulated by that for the 
linearized equation, over a short range near x,. The situations in which no one 
of the parasitic terms tends to dominate the term simulating the true solution, 
as the calculation proceeds from that point, are often said to be characterized by 
short-range stability.+ 

In order to illustrate the occurrence of instability, we present in Table 6.2 
the results of calculations based on the problem 


γ 2.2 0)=2 


The entries in the column headed y,, are values determined by the Milne 
method, using (6.6.22) for prediction and (6.6.23) (Simpson’s rule) for correction. 


} Various definitions of numerous types of stability and instability occur in the 
literature. 


Table 6.2 
x Ym YM Ya VA Yr 
0.0 | 2.000 — 2000 — 2.000 
0.5 | 1368 — 1.368 -- 1.368 
10 | 1.1320 — 115. — 1.135 
15 | 1052 - 1046 — 1.050 
2.0 | 1.014 --42 | 1016 -64 | 1.018 
2.5 | 1012 -36 | 1.005 -17 | 1,007 
3.0 | 0.995 15 | 1.002 -10 | 1.002 
3.5 | 1011 —33 | 1.001 --2 | 1.001 
4.0 | 0.986 40 | 1.000 --2 | 1.000 


4.5 1.020 ~357 1.000 —1 1.000 
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The entries y,, represent the differences between the final results and the initial 
predictions, and —+,,/29 affords the Milne estimate of the truncation error in 
each step. The entries in the column headed y, were obtained by the modified 
Adams method, using (6.2.10) with third differences as a predictor and (6.3.2) 
with third differences as a corrector. The estimated truncation error in each 
step is afforded by —y,/14. The entries in the y, column are values of the true 
solution y(x) = ε΄“ 25 + 1, rounded to three decimal places. As has been noted, 
both numerical procedures introduce truncation errors of order h° in each step. 

A large spacing is chosen deliberately, and all calculations are rounded 
to three decimal places, in order to cause the effects of the error propagation to 
become evident at a relatively early stage of the process. The requisite starting 
values (above the broken lines) are correct to the places retained. 

The tabulation is intended, not only to show the increasing oscillation 
of the first solution about the true solution, but also to serve as a reminder that 
the quantity —y,,/29 affords at best only an estimate of the truncation error 
introduced in each step and does not, in itself, indicate the manner in which 
the effects of that error (and other errors) are propagated. Thus, for example, 
the fact that —y,,/29 is smaller than 2 in each step must not be interpreted 
as indicating that the accumulated error at each step is less than 2 units in the 
last place. In fact, that error is seen to amount to — 20 units in y,,(4.5) in the 
present case, and its magnitude exceeds the sum of the magnitudes of the in- 
dividual error estimates. 

In addition to formulas which are specializations of (6.7.1), it is possible 
to derive an extensive class of other (p + 1)-step simulations to the differential 
equation 


y = F(x, y) 
in the more general form 


Yara + BoYn + BiYn-1 + °° + BpYn-p 
= μία. Viar + CVn + CVn ἜΤ + α,}5--ν) (6.7.19) 
where 8, # 0. In particular, when p = 2andr = 1, the six adjustable constants 
can be fade to yield a one-parameter family of correctors with accuracy of 
order 4 (that is, with truncation-error terms of order μ΄), including, for example, 
the third-difference Adams corrector (6.6.2) and the Milne (Simpson’s rule) 
corrector (6.6.23). 
These formulas can be obtained operationally by requiring that the relation 


(E + Bo + BE? + βιΕ Ὁ = h(a_,E + a + “,E”*)F 


become equivalent to the relation 
Dy = F 
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with E = e*?, when an error term of order h° is introduced in the former 
relation. Thus the «’s and f’s are to be such that 


e+ Bo + Bye’ + Boe * = t(a_ye' + % + ae ') + O(t?) 


so that there follow five conditions relating the six available parameters. 
No study of this set is undertaken here. However, we note that the member 
for which 8, = 0 is of the form 


5 
Yar = 2 On -- Yu) + “yar + 2m - HD -HyO 67.20) 

8 8 40 
This formula was first advocated by Hamming [1959], who showed that it is a 
commendable member of a set of formulas which (apart from other considera- 
tions) combine the stability properties of the third-difference Adams corrector 
with the advantage of requiring only three evaluations of F(x, y) per calculation, 
as does the Milne corrector, whereas the third-difference Adams corrector 
requires four such evaluations. It can be used with the fourth-order Milne 
predictor (6.6.22), which also requires only three F evaluations, in which case 
the stepwise truncation-error estimate considered in Sec. 6.6 is found to be 


—HIne1 — Ynes)/121. 


6.8 Propagated-Error Bounds 


In actual calculation, the calculated value of y,,, generally will not be given 
exactly by the right-hand member of the relevant formula (6.7. 1), because of the 
necessity of effecting roundoffs. If we replace y,,, by Yn+1 + R,, where R, 
is inserted to account for the effects of roundoff in the nth step, Eqs. (6.7.1) 
and (6.7.2) can be combined in the form 


r 


Yn+1 = »,--ρ ἘΝ h ΞΝ OF (Xp Ks »,--κ) ~ Κι, (6.8.1) 


On the other hand, if we denote the value of the true solution of the given 
problem when x = x, by Y,, we have also the relation 


Yuet = ΡΣ +h > pF (Xp Ks γ 3.) + 1, (6.8.2) 
k=-1 


where we here denote the truncation error corresponding to the nth step by T,. 
If we subtract (6.8.1) from (6.8.2) and write 


δ, = Y, — Vn E, = T,, + Αι, (6.8.3) 
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we find that the error ¢, associated with the calculated value y, satisfies the 
difference equation 


fn+1 = En—p Ἔ h os Oy LF (Xn—ks V5) " Ε(χ,--» γ},..--κ)]} 7 Ε, (6.8.4) 


In order to obtain a bound on the magnitude of «¢,, we notice first that we may 
write 


F(x; Y) — Fp yd = (% - γα, Ni) 
ἐμ( Ni) (6.8.5) 


if F, = OF /dy exists, where ἡ; is between y; and Y,, so that (6.8.4) can be written 
in the form 


[1 " μα. iF y(Xn41; Mn+ ι}18,.εἰ 
= €,-p + h > Oy En — a y(Xn— Ks η,--κ) zi En (6.8.6) 
k=0 


Suppose now that, for the range of values of x and y involved in the overall 
calculation, we have 


F(x, SK (68.7) 


where K is a known constant, and consider the related difference equation 
(1 — Khlo_,|ens1 = en-p + Kh > lo,|e,-~. + E (6.8.8) 
k=0 


where E is such that 
ΙΕ, ΞῈ (n=rr+t+1,...) (6.8.9) 


If a_, ψ 0, so that the formula is of closed type, suppose also that ἢ is suf- 
ficiently small to ensure that 

Kh\a_,| < 1 (6.8.10) 
From (6.8.6) and (6.8.7), we have 


1 = μα. Fy(Xn+1 Nn+1)| lena = a ἊΝ Kh Ps Ια [8..-- αἰ + ΙΕ, 


and hence, if |6,| < 6,, [8,.--[ S Cn-1>-+ +> ἰδῃ--εἰ S env there follows also 


{1 ~~ μα. Fy (%n41 Nn+1)| len +4! Ξ 4 ἐπ Kh\a_;|)en+1 


and thus [6...1| < en41- That is, if [ε S e; for r + 1 successive integral 
values of i, then, by induction, the same is true for all succeeding integral 
values of i. 

Now the error €, vanishes except for roundoff since yp = Yo is prescribed. 


Also, again assuming that r 2 p, the errors &;,..., 8, are errors associated with 
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the starting values y,,..., y,, supplied by an independent analysis. Let δ be a 
positive number which is not exceeded in magnitude by any of these initial 
errors. Then, if e, is a solution of (6.8.8) which is not smaller than ὃ for n = 0, 
1,..., 7, it follows that 


lenl S en 


for all relevant values of n. That is, any such solution of (6.8.8) will ““dominate’”’ 
the solution of (6.8.4). 
Since the nonhomogeneous term £ in (6.8.8) is a constant, a particular 


solution of (6.8.8) may be assumed in the form e, = —A, where J is a constant, 
and the introduction of this assumption leads to the determination 
E 
= ---- 6.8.11 
Kho ( ) 


with the additional abbreviation 


r 


c= > |x| (6.8.12) 
κέξι 


It may be noticed that σ = p + 1 when all the α᾽β are positive, and also that 
o = p + 1 in any case, by virtue of (6.7.10). 

To this particular solution may be added any multiple of 8”, where β 
is determined such that β' satisfies the homogeneous difference equation 
obtained by replacing E by zero in (6.8.8), and where 8 accordingly must satisfy 
the characteristic equation 


(1 — Khjo_,|)p"** 
— Κ(αρΙβ' + |ay|B* + +-* + [α,. 4{β + lol) -- Br? =0 (6.8.13) 


Since the left-hand member is negative when B = 1 and tends to +0 as 
B — +00, there is a positive real root By which is larger than unity. Indeed, 
for small values of h, it is found to be expressible in the form 


Kho 


58 2 
βο-1Ὁ en? O(h?) (6.8.14) 


With this value of β, and the value of A defined by (6.8.11), it follows that 

6, = cho — A satisfies (6.8.8) for any constant value of c. In addition, since it 
increases steadily with n, if we determine c in such a way that eg = @, so that 
6, = ὅβο + ABA — 1) (6.8.15) 


then we will have e, = é@ for allm = 0. Thus this particular solution of (6.8.8) 
will dominate the solution of (6.8.4). 
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Hence, in summary, we deduce that the error &, association with the value 
of y, determined by step-by-step calculation based on the formula 


Yat+1 ~ Yn-p + h > OF (Xp— Ks Vn—k) (n = r) (6.8.16) 


k=-1 


is limited by the inequality 


lel S Bo + AB, 


where é is the absolute value of the largest error associated with the r + 1 starting 
values Yo, V1, --+5 Yrs A is defined by the equations 


1) (6.8.17) 


E 


= —— ¢= α 6.8.18 
ig 7h, Bd BAB 


K is the maximum value of [0 /dy| for the range of values of x and y involved 
in the calculation; E is the absolute value of the maximum total error introduced 
in each step; and B, is the positive real root of the equation 


prtt = BP? + Kh > |a,|B"-* (6.8.19) 
k=~1 
which exceeds unity. 

In those cases when the coefficients «, are all positive, reference to (6.7.10) 
shows that (6.8.11) reduces to 4 = E/[Kh(p + 1)] and that the expansion 
(6.8.14) becomes βὸ = 1 + Kh + O(h’). Also, the characteristic equation 
(6.8.13) is then equivalent to Eq. (6.7.6) with A replaced by K, and finally 
Bo © exp [K(x, — Xo)]. 

Of the three constants @, E, and K needed for the application of this error 
estimate, the first may be estimated initially; and it represents the maximum 
roundoff error associated with the initial values determined before the stepwise 
calculation is begun, if those values are correctly determined to the number of 
places retained. 

The constant E comprises the maximum error introduced in one step 
because of roundoff and truncation. The latter effect cannot be estimated in 
advance unless F(x, y) is of a particularly convenient form, but it can be estimated 
in the course of the calculation by approximating the factor hA"y™(é) by 
ἢ A™~*y’ in the truncation-error term, or by making use of the quantity y,, 
defined in Sec. 6.6 if one of the methods described in that section is employed. 


+ If OF/dy is known to be negative throughout the calculation, a less conservative 
bound often can be obtained in a correspondingly simple form. For example, see 
Probs. 21 and 22. 


NUMERICAL SOLUTION OF DIFFERENTIAL EQUATIONS 269 


The constant K can be calculated in advance if the equation is linear, 
since then OF /dy is independent of y, and it can be estimated in advance (assum- 
ing that an analytical expression for OF ὃν, in terms of x and y, can be obtained) 
if the range of values of y can be estimated initially. Otherwise, sample values 
of dF /dy can be tabulated as the calculation proceeds. Thus, for example, in 
the case of (6.4.5) we have ΘΕ Ον = —1, and hence K = 1. For the equation 
γ' = x’ + γῆ, Καὶ would be estimated as the largest value of 2|y| encountered 
in the calculation.t 

The maximum effects of errors due to truncation and to roundoff can 
be treated separately. However, because of the more or less random fluctuation 
in sign of errors due to roundoff, any upper bound on the overall effect of a 
large number of roundoffs, no matter how precise, is likely to be extremely 
conservative in any actual calculation. On the other hand, the statistical 
analysis of such effects in stepwise integration is rather involved (see Henrici 
[1962]) and, in any case, can afford only the probability that the overall effect 
of roundoff errors will not exceed a certain amount. 


6.9 Equations of Higher Order. Sets of Equations 


In order to apply one of the preceding methods to a differential equation of hi gher 
order, it is often convenient first to replace that equation by an equivalent set 
of equations of the first order. We here illustrate the procedure only in the case 
of a second-order equation, after which the generalization to higher-order 
equations, or to sets of simultaneous equations of more general type, will be 
obvious. 

The problem 


y= Gx, yy‘) γι) = Yo = '(X0) = Yo (Ὡἀ.9.1) 
is equivalent to the problem 
γ Ξε γ(Χο) = Yo 
μ᾽ = G(x, y, u) u(Xo) = Yo 


which is, in turn, a specialization of the more general problem in which u is 
replaced by, say, F(x, y, μὴ in the right-hand member of the first equation. 


(6.9.2) 


t It should be noticed that the estimate 0F/dy ~ Ay’/Ay (which has been suggested 
in the literature) is not generally significant. Whereas we do have the relation 


Ay, = [FOn41,Yn4+1) = FXn+1,¥n)] + [F(Xn+1,¥n) _ F(Xn,Yn)] 
~ Fy(Xn+1, Yn) Ayn + AF (Xn, Ya) 


there is no reason to suppose that the last term is small relative to A ν΄. 
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It is usually convenient, but not necessary, to use the same formula in 
dealing with the two equations in (6.9.2). The approximate formulation then 
comprises two relations which are expressible in the general form 


Ynts = Yn-p + μία. 1Yn41 + ἀργὴ + %Vn-1 Γ᾽ + Yur) ἀ(6.9.3) 
and 

Uns = Upp + πία. Ung + GU, + αἰ Ung Ἔ τ +O Uy_,) (6.9.4) 
or in equivalent forms in terms of backward differences, with 


y, = U, (6.9.5) 
and 
Un = G(x, Ynys 14) (6.9.6) 


In any case, the relations (6.9.3) and (6.9.4) apply only for n 2 r, the 
values yo and μρ being given and the values y,,..., y, and w,,..., u, being 
obtained by another method (such as the use of Taylor series or of one of the 
methods to be given in Sec. 6.15). The values of uo, ui,..., Με are calculated 
in advance from (6.9.6). If the formula is of open type, so that «_, = 0, ¥,414 
and u,,, are then calculated directly by use of (6.9.3) and (6.9.4). Next yr+1 18 
given immediately as u,,,, and u/,, is calculated as G(x,41, ¥r+15 M41), 50 
that data are then available for advancing by another step. 

If the formula is of closed type, so that a_, # 0, an initial prediction 
u), is first obtained (by use of a supplementary formula of open type, by 
pure estimation, or otherwise) and γί3), is obtained by replacing γι... by ul, 
in (6.9.3). Next u/) is obtained from (6.9.6), with y and wu replaced by their 
zeroth approximations, and the cycle is closed by calculating u\), from (6.9.4). 
If the calculated value differs from the initial prediction uS{),, the cycle may be 
iterated until agreement is obtained, when the iteration converges. The next 
step is then taken in the same way. 

The iteration is thus described by the equations 


yO, = Ya-p + W(a_yuphs + °°°) 
uP, = GXns1s 11» Unt1) (6.9.7) 
uy? = Uy-p + πία. κί, po 
where »,.-- »» U,—p, and all omitted terms remain fixed throughout the iteration. 
There then follows also 


y — yO = ha_,(u — u) (6.9.8) 


+ As in the first-order case, it is true here that, on the average, iteration beyond a first 
correction is as likely to worsen the approximation as to improve it when the iteration 
converges. 
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and 
u—u®*) = μὰ [G(x, y, u) — Gx, yY,u)] (6.9.9) 


where the common subscript n + 1 is suppressed throughout. Now if, near 
(Xn+12 Yn+15 Un+1), it is true that 


IG, yl 5 Kar 1G6,0,¥,0] SLj41 (6.9.10) 


IA 


then we may deduce from (6.9.9) the inequality 
je — wt DOL S λα. al[Kusily — yO] + Levile — μ] (6.9.11) 
and hence, making use of (6.9.8), 
ju — uD) « Ala_ylAla_lKia1 + Layla — u| (6.9.12) 
Thus, convergence will attain if ἢ is so chosen that 
ἤ]α. ι]({|ὰ. (Κι... + Lnt1) 41 (6.9.13) 
and the convergence factor p, in the nth step is such that 
Pol S Ala_,|(Ala_|K +L) (6.9.14) 


where K and L are upper bounds on |G,| and |G,|. If G does not explicitly 
involve u = y’, it is seen that the convergence factor is of second order in h. 

Whether or not iteration beyond a first correction is contemplated (and 
especially if it is not!), the spacing ἢ should be chosen so that |p,| « 1 in the 
initial computation (and modified as necessary in following steps) since other- 
wise the stepwise truncation-error estimates are invalid and, more importantly, 
unfavorable propagation of truncation errors may be anticipated. 

In order to illustrate the procedures, we consider the simple problem 


yo =ytxy’ y(0) = 1 y(0\)=90 ~~ (6.9.15) 
For the purpose of starting the calculation, we first obtain the expressions 
y" =y 4+ xy’, γ" ΞΕ 2γ' Be xy”, γ᾽ a2 3y" 4 xy”, 


and hence, with x) = 0, 


iL 


Y= 1, γο Ξ 9, yo= 1, yo = 0, yo = 3, yo Ξ-. 0, γ᾽ = 15, 
so that, with h = 0.1, there follows 


1/s\* 1fs\*. 1 fs\o 
Peaifc( 2) 42] γ te ace 
᾿ 2 (2) 8 (2) 48 (2) 

3 5 
Vs Ss ifs ifs + ce. 
10 2.110) 8.10 


I 
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Thus we may obtain, in particular, 


γι = 1.0050, y, = 1.0202, y3 = 1.0461, 


6.9.16 
γ᾽ = 0.1005, y> = 0.2040, ys = 0.3138, Ca) 


if only four places are retained. For the purpose of simplicity, we here make 
use only of the calculated values of y, and y;, and proceed by using a formula 
involving only first differences. The preliminary calculation then can be arranged 
as follows: 


x y γ =u vy’ V2y’ y =yt xu vy” ν2γ" 
0.0 1.0000 0.0000 — -- 1.0000 - - 
0.1 1.0050 0.1005 1005 — 1.0150 150 — 


If the Adams formula of the open type is used with first differences, there 
follows 


1.0050 + 0.1[0.1005 + 4(0.1005)] = 1.0201 
0.1005 + 0.1[1.0150 + 3(0.0150)] = 0.2028 


ἐὲ 


»}2 


2 


Ur 
and the third line of the calculation is 


0.2 | 1.0201 | 0.2028 | 1023 18 | 1.0607 | 47 807 
(6.9.17) 


Since a second difference νἦγ" of about 300 units would contribute about 
ΤΣ is: 300 = 12 units to y’, while a second difference Vy’ of about 18 units 
would contribute about 0.7 units to y, if we suppose that the neglected second 
differences relative to x, are of the same order of magnitude as those calculated 
here, we may consider these quantities as rough estimates of the truncation 
errors in y, and 1, introduced in the step from x, to x. (Further information 
with regard to the reliability of these estimates would be afforded, in succeeding 
steps, by a consideration of the extent to which the second differences remain 
constant.) If such errors are not tolerable, and this method is to be used, it is 
then necessary to calculate one or more additional starting values of y and u, 
and to retain at least one more difference. In this connection, it should be 
kept in mind that the errors introduced in each step are propagated into 
succeeding calculations, as was seen in the first-order case in Secs. 6.7 and 6.8, 
in a manner which depends both upon the problem involved and the integration 
formula employed. 

If, instead, the Adams formula of closed type with first differences is 
used as a corrector, with the open-type formula as a predictor, the value 0.2028 
is obtained, as in the preceding method, as the zeroth approximation uf? = 


NUMERICAL SOLUTION OF DIFFERENTIAL EQUATIONS 273 


y2. The corresponding difference Vy; is then entered, the prediction y is 
determined by the formula 


y = 1.0050 + 0.1[0.2028 — 4(0.1023)] = 1.0202 


and u{ is determined as yS + x,y” = 1.0608, so that the third line 
takes the form 


0.2 | 1.0202 | 0.2028 | 1023 — | 1.0608 | 458 a 
(6.9.18) 


Next the cycle is closed by calculating 
uS) = 0.1005 + 0.1[ 1.0608 — 4(0.0458)] = 0.2043 


Since this corrected result differs from the initial prediction, the entry 
0.2028 in the third line is altered to 0.2043 and the cycle may be repeated, at the 
end of which the third line has been changed to 


0.2 | 1.0202 | 0.2043 | 1038 33 | 41.0611 | 461 311 
(6.9.19) 


Finally, the value uS?) is calculated and is found to agree with uS) to four places, 
so that the iteration is completed. 

The fact that h = 0.1 is an appropriate spacing here could have been 
established in advance by noticing that since G, = x and G, = 1, and since 
α. 4 = 4 for the Adams one-difference corrector, the initial convergence factor 
Po (with x = 0) is approximately h/2. 

Reference to (6.3.2) shows that incorporation of the second differences 
would contribute —75°y°311 ~ —3 units to y, and — ,-34°33 ~ 0 
units to y,. A somewhat more dependable estimate of the truncation error 
introduced in a single step can be obtained by calculating the prediction y%, 
by use of the open formula, in place of the closed one, but using the closed 
formula to supply at least one corrected value of y,,,. Since also u{%, is 
calculated by the open formula, it then follows that if we write 


ao [ , —= 0 
Yn+1 = Yn+1 τ Ver Vnti1 = τι — U0, 


where γ,..1 and u,,, are corrected values provided by the closed formula, 
then the desired truncation errors in y, ,, and u,,, are approximated respectively 
by —yn41/C and —y,.,/C, where C is the numerical factor considered in 
Sec. 6.6, here equal to 6 (see also Prob. 27). It is convenient to tabulate Vn 
and y, in place of the two first neglected differences, so that the line (6.9.19) 
then is replaced by 


7 
0.2 | 1.0202 | 0.2043 | 1033 1 1.0611 41 15 
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The estimated truncation errors introduced into the calculated values of y2 
and y, are then —15 ~ —3 units and —4 ~ 0 units, as before. 

It may be noticed that the actual errors, obtainable by reference to the 
rounded true values given in (6.9.16), are indeed correctly predicted, in this 
case, by the estimates afforded by both procedures. In a fairly lengthy sequence 
of steps, however, the propagation of errors becomes important. A treatment 
analogous to that of Sec. 6.8 is rather unpleasant in the case of a second-order 
equation and is omitted. 

In this connection, however, it may be remarked that an elementary 
analysis quite similar to that of Sec. 6.7 permits a simple stability study of the 
use of (6.9.3) and (6.9.4) in the numerical solution of an equation of the special 
form 

y" = Ly’ + Ky — (6.9.20) 
where L and K are constants. Here the exact solution is of the form 
y(x) = ce4!* + c,e42* (6.9.21) 


where A, and A, are the roots of 42 — LA — Καὶ = Oand ¢, and ὁ) are deter- 
mined by the initial conditions, and it is again found that the use of formulas 
(6.9.3) and (6.9.4) with p > 0 introduces parasitic solutions which may dominate 
the part of the solution which simulates the exact solution when A, and A, 
are negative or have negative real parts. When p = 0, this situation can exist 
only when excessively large spacings are employed. 

In particular, if use is made of Milne’s method, based on (6.6.23) and 
corresponding to p = 1, the generated numerical solution is found to be 
approximated by 


Ceti 4 Crete 4+ (--Ἰὐ ΓΟ, 6. 4/3 + Cye 42/7] (6.9.22) 


when x = χα, if the spacing ἢ is small and if roundoff errors are neglected, 
where the C’s are determined by the starting values. Thus, for example, if the 
true solution is of the form 

y(x,) = ce * cos (6x, + ὦ) 


where a > 0, then the parasitic part of the numerical solution will be ap- 


proximated by 
(-- 1) ἐπ 15 cos (Ὁ + o'| 


and will tend to dominate the desirable part of the numerical solution when x 
is large. 

It is also important to notice that the propagated error generally will 
possess components simulating both the terms e41* and e42*, even when the 
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parasitic solutions are not troublesome. This situation is of particular dis- 
advantage when the initial conditions require that the exact solution involve 
only the term which grows least rapidly (or decays most rapidly). 

When the governing differential equation y” = G(x, y, y’) is less simple 
than (6.9.20), a qualitative analysis of short-range error propagation near a 
point (x;, y,) generally can be obtained by identifying L and K in (6.9.20) 
with the values of 0G/dy' and 0G/dy, respectively, at that point, if those partial 
derivatives do not vary excessively near that point. 

The generalization of the preceding treatments to a pair of equations of 
the form 


y = F(x, y,u) (Xo) = Vo 


(6.9.23) 
μ᾽ = G(x, y, u) U(Xo) = Uo 


or to a more general set of simultaneous differential equations is straightforward 
in principle and in numerical interpretation. However, the complexity of the 
relevant analysis of propagated errors and stability may be forbidding. 


6.10 Special Second-Order Equations 


Second-order equations of the special form 
y’ = Gx, y) 4({6.10.1) 


in which y’ is not explicitly involved, arise rather frequently in practice. If 
values of y’ are not required, it is desirable to have available methods which 
do not entail their calculation. 

Two formulas having this property were derived, as formulas (5.5.12) 
and (5.5.13) for repeated integration, and may be written with the present 
notation in the forms 


Yn+1 = 2», — Yaa + WL + OV + GeV? + GEV? + ἐξὸν" 
+ χον ἘΠ.» (6.10.2) 

and 
Ynt1i = 2γ, — Va-1 + hi = Vf ~zV” Ἔ ον" τ χίον 

— χέξον᾽ +°°*)Yn41 (6.10.3) 
The former is of open type, the latter of closed type. In order to use either, a 
suitable number of preliminary ordinates must be calculated by another method, 
which takes into account the fact that y and y’ are prescribed at x = Xo, after 


which the technique of the ensuing calculation, often known as Stérmer’s 
method (St6rmer [1921 ]), is evident. 


276 INTRODUCTION TO NUMERICAL ANALYSIS 


Formulas (6.10.2) and (6.10.3) are each representative of a whole class of 
similar formulas, one class of open type and the other of closed type, which 
are analogous to the formulas given in Secs. 6.2 and 6.3. In particular, an 
additional formula of open type, 


Yost = Vn t ),..2 — Yanez + 3,3( — V + GeV? + OV?  ἡξεν" 
+ a¥5V? + ἽΝ (6.10.4) 


may be listed (see Prob. 16 of Chap. 5). 

Formulas (6.10.3) and (6.10.4) comprise a pair of formulas for both of 
which the coefficient of the third difference vanishes. If only second differences 
are retained in these formulas, they become 


Yn+1 = Yn + Yn-2 — Yn-3 
eile VYn + ὙΣν Yn) 


+ " ὑπ y(E) (6.10.5) 
ovr + 2y"_, + 5...) * 240 
and 
Ynt1 = 2Yn — Vn-1 
ΠΣ ἘΣ γῆς. + TeV Yaad μό 
ss = tes + 10y, + Yat ᾿ "0 ὦ τ 


where the coefficients of the remainder terms are the same as those of the 
omitted fourth differences in the formulas (6.10.3) and (6.10.4). The error term 
when one of the formulas is truncated with a difference not preceding one with a 
zero coefficient is of more complicated form. 

Milne’s method employs (6.10.5) to afford an initial prediction ye 
and (6.10.6) as the corrector. If the factor y,4; = Yn+1 — Yarr is calculated 
(as in Sec. 6.6), the estimated truncation error in the nth step is T, ~ —y,/18. 
Also, the convergence factor in the iteration at the mth step (see Sec. 6.6) is 
easily found to be ἘΡΡΓΟΜΙΒΒΤΟΥ Pn ἡ σα,» Yn), SO that ἢ should be 
sufficiently small to ensure that 34,7|G,| « 1. 

In order to illustrate the eievant analysis of error propagation, we consider 
the special case in which (6.10.6) is the basis of the method.f With the same 
notation as was used in earlier developments, it is easily seen that the error 8, 
associated with the calculated value y, satisfies a relation of the form 


δ,..1 = 28, — &-1 +o “Grau “ι + 10G,@, + Gy, e:-1) + E, (6.10.7) 


+ The method used for prediction is irrelevant to this analysis. 
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for n 2 1, where G, is an appropriate value of G,. This relation can also be 
written in the form 


h? h? 
(1 a 1) Guo) (En41 ee En) = ὃ. — E44 1 12 L(G,,,., + 10G, én 


+ Gy &i-1] + £, (6.10.8) 
so that, if we have 
Gx, SK (6.10.9) 
for all relevant values of x and y, and if ἢ is sufficiently small to ensure that 
Kh? 


— «1 (6.10.10 
1 ( ) 


there follows 


Kh? 
1 — ——})le,., -- δι S |e — &- 
( =) ee ee 


2 
+ ἘΞ (led + [6,0 + IE! (6.10.11) 


If e, satisfies the relation 


2 2 
(: - =) (Cn41 — Cn) = Cp — Cn-a + = (lle, + e,-1) + E (6.10.12) 


12 
where 
E = |E,lmax (6.10.13) 
and if 
Cy = [δρ| €; — €o Φ δι — | (6.10.14) 
there follows also e, — ἐρ = [δι] — [δο] and hence e, — |e,| > δρ — [Eo] or 


€, 2 |é,|. Then, by comparing (6.10.11) and (6.10.12), we find also that 
62 — 6, 2 |e, — δι] and hence 6) — 6, = |e,| — |e,| ore, = |e,|. By induction, 
there follows e, — 6,-.-. 2 |e, — &-,| and e, = |e,| for nm = 0,1, 2,..., so 
that the error ¢, is then dominated by e,,. 


The general solution of (6.10.12) is readily found to be of the form 


ἢ n E 
e, = AoB® + A,B" — τη: (6.10.15) 


where fy and f, are the roots of the equation 


(: “ae . 2(1 +e ε (: na =0 (6.10.16) 
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and hence 
1 14 3;Kh? + J/Kh? + 1K?h4* = 2 

By = — = — 135 NER = 1+ VKh + O(h?) (6.10.17) 
B, 1 — Σ Κἢ 


When A, and A, are determined by the conditions eg = 0 and e, = |é|, 
under the assumption that ¢) = 0, there follows finally 
lenl S Cn 
where 
E 1 7 Bole, | εξ | 
ὃ; Se ef Be ed - er Be Be” 6.10.18 
ci Ba 8 + fo) + prt; BS -- Bo") (6.1018) 
If roundoff errors are ignored, we have E < z1,h°| γ᾽ 
Bo = 1+ ΚΙ + O(h?), and n = (x, — Xo)/h, there follows 


max: ISO, since 


h* e Peony, 
en, ~ WOK πὰ ἘΞ {cosh [VK (X, — Xo)] - 1} 


+ 11 sinh [VK (x, — x9)] (6.10.19) 
ν ΚΙ 
when ἢ is small. Whereas | γἿ εχ generally is not easy to estimate directly, 
the factor [h*| y_,,,]/240 can be estimated as h~?|Thlmax © |Ynlmax/18h7 or as 
\V*yilnax/240. (Similar but more involved error bounds can be derived in the 
more general case.) 
In order to illustrate the calculation and to provide a basis for the con- 
siderations of the following section, we apply Milne’s method, based on (6.10.5) 
and (6.10.6), to the problem 


yi’ =xy yO) Ξῦὸ y'(0) = 1 (6.10.20) 
for which the exact solution is expressible in the form 
y = 3:"ΓΓΩΙ, (χ᾽) (6.10.21) 


where J, ,3 is the modified Bessel function of the first kind of order 3. With 
h = 0.1, the calculation can be arranged as in Table 6.3, if differences are used 
and if five places are retained. The first five lines are easily calculated in advance 
(only three lines are needed), the ordinates being determined by use of a single 
Taylor series, and the values of y”, determined from the equation γ΄ = xy, 
and of the differences are entered as shown. If (6.10.5) is used to predict y,, the 
prediction is found to be 0.50522; the remainder of the sixth line is then filled 
in, after which (6.10.6) gives the corrected value 0.50523, and the resultant slight 
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modification in the remainder of the line does not call for additional iteration. 
The value ys = 0.50523 — 0.50522 is then listed as +1 unit in the fifth place, 
and the calculation proceeds in the same way in succeeding steps. 


Table 6.3 
x y γ vy” Vy” γ 
0.0 0.00000 0.00000 
0.1 0.10001 0.01000 1000 
0.2 0.20013 0.04003 3003 2003 
0.3 0.30068 0.09020 5017 2014 
0.4 0.40214 0.16086 7066 2049 
0.5 0.50523 0.25262 9176 2110 1 
0.6 0.61086 0.36652 11390 2214 --Ἰ 
0.7 0.72017 0.50412 13760 2370 -1 
0.8 0.83454 0.66763 16351 2591 -1 


Since the truncation error in each step may be estimated as —y/18, we 
may be reasonably confident of the calculated values of y to the places retained 
(except for the usual uncertainty of one unit in the last place, due to roundoff). 
In fact, the small values of y may be expected to correspond to the effects of 
roundoff. 

Clearly, the alternative forms of (6.10.5) and (6.10.6) may be used instead, 
without the need for calculation and recording (or storing) of differences. 
Further, in place of using (6.10.5) to obtain an initial prediction for y,,,, it is 


possible to estimate the second difference V7y,, , and then to fill in the remainder 


of that line from right to left through y;,,, after which use may be made of 
(6.10.6) to initiate the correction. Thus, in Table 6.3, a glance at the V7y" 
column would suggest the estimate V7yg ~ 0.026 after the calculation of yr. 
However, this procedure would not supply data for the y column. 


It may be noticed that any linear equation of the second order of the form 
Y" + P(x)Y'’ + ΟΟΥΥ = F(X) (6.10.22) 
can be reduced to the form (6.10.1) by the change of variables 
Y(x) = ε΄ G/2)I P4474) (6.10.23) 
in accordance with which (6.10.22) takes the form 


» + 70)» = g(x) (6.10.24) 


where 


70) = 2400) — ΖΡ Ὁ) -- [PO)P} σοῦ = eV ΠΡ) (6.10.25) 
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6.11 Change of Interval 


In many situations it is desirable to, say, double or halve the spacing at a certain 
stage of the advancing calculation. Doubling the spacing presents no difficulties, 
since it involves only the use of alternate values of previously calculated data, 
together with a direct calculation of modified differences relevant to the new 
spacing, if differences are used. 

Thus, in illustration, the smallness of the entries in the y column of Table 
6.3 suggests that the same accuracy may be obtained with a doubled spacing 
h' = 2h = 0.2. In fact, reference to the error expression in (6.10.6) shows 
that the truncation error in each step can be estimated roughly by 


"240 ht 
and it is found that, for the data of Table 6.3, V*y” varies from 0.00024 to 
0.00065, so that the largest single truncation error in the range covered is 
probably less than about three units in the eighth decimal place. Doubling 
the spacing / will multiply the truncation error by a factor of the order 2° 
and hence may be expected to lead to a truncation error of less than about one 
unit in the sixth place in each step. The calculation following the work of 
Table 6.3, with doubled spacing, is given in Table 6.4. 


Table 6.4 
x y y” vy" v2y" γ 
0.0 0.00000 0.00000 
0.2 0.20013 0.04003 4003 
0.4 0.40214 0.16086 12083 8080 
0.6 0.61086 0.36652 20566 8483 
0.8 0.83454 0.66763 30111 9545 
1.0 1.08531 1.08531 41768 11657 1 
1.2 1.38000 1.65600 57069 15301 3 
1.4 1.74164 2.43830 78230 21161 7 


After three lines of calculation, the y column serves a warning that the 


truncation error per step may have increased at that stage to about one-half 
unit in the last place retained. Thus (as might have been anticipated in advance 
from the increasing rate of growth of V7y”) the advantages of the more rapid 
calculation were short-lived, and the doubling of the spacing was in fact ill- 
advised in the present case. However, the results of Table 6.4 may serve to 
illustrate the somewhat more complicated transition to a halved spacing. 


+ An obvious alternative consists of merely retaining additional differences in the 
relevant integration formulas (6.10.3) and (6.10.4). In the present case, however, it 
is assumed that retention of the advantages of the special formulas (6.10.5) and 
(6.10.6) is considered to be desirable. 
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In the present analysis, knowledge of y(1.3) would permit the determination 
of y’(1.3) and, consequently, Vy’(1.4) and νγ΄ (1.4), relative to the new spacing 
h = 0.1. Then an iteration, based on (6.10.6), could be initiated by estimating 

V7y"(1.5) and proceeding as was outlined in the preceding section. The value of 
y(1.3) could be obtained by an interpolation involving certain of the available 
calculated ordinates. Clearly, care should be taken to obtain this ordinate to 
the same degree of accuracy as the other ordinates. The use of a difference 
formula, for this purpose, would entail the calculation of differences of the 
ordinates themselves, but would be desirable, in order that the accuracy of the 
interpolation could be estimated. 

Another procedure consists of using the formulas derived in Sec. 5.7, to 
transform the tabulated differences Vy"(1.4) and V7y"(1.4) to corresponding 
differences relative to the halved spacing. The ordinate y(1.3) can then be 
determined, for example, by rewriting (6.10.6) in the form | 


h? " ut ”f 
Yn © (γα + Vn—1) — 2 (Vn+1 τι VY n+1 + πνίγει (6.11.1) 


If the difference operators corresponding to the halved spacing are denoted by 
V’ and V”, Eq. (5.7.4) yields the formulas 
Vv’ -Ξ iV + iy? + εἶν" + τξ εν + si<eV + 72i,V° 
+ 533,V’ Ὁ (6.11.2) 


and 


ν΄: -- iy? + tv? + εἰν + τ2εν" + ΙΝ" + τύϑεν" + aie (6.11.3) 


when A is replaced by —V in (5.7.4), in accordance with the results of Sec. 5.7. 
The use of these formulas permits the calculation of the differences relative 
to x = 1.4 in the third line of Table 6.5, after which Vy” and y” are obtained 
in line two and y” in line one. A useful check on the accuracy of the modified 
differences is then afforded by a comparison of the value of y"(1.2) so obtained 
with that previously obtained in the direct calculation of Table 6.4. The 
ordinate y(1.3) is next calculated from (6.11.1). 


Table 6.5 


—e_eeolwikeEae_™—_———— | ————_— | —— Ls 


1.2 1.38000 1.65600 
1.3 1.55071 2.01592 35992 
1.4 1.74164 2.43830 42238 6246 


Then if, say, ν 7)" (4.5) is estimated roughly as being equal to V7y"(1.4), the 
line 
1.5 | 1.95701 | 2.92314 | 48484 6246 | (6.11.4) 
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is obtained (from right to left), the first approximation to y(1.5) being obtained 
by use of (6.10.6). When y’(1.5) and its differences are recalculated (from 
left to right), the next approximation to y(1.5) is obtained from (6.10.6) as 
1.95702, and the final form of this line of the calculation reads as follows: 


1.5 | 1.95702 | 2.93553 | 49723 7487 | (6.11.5) 


Sufficient data are now available for the use of (6.10.5) as a predictor in the next 
step, if this is desired, after which entries in the y column are again calculable. 
(Variations and modifications of this procedure are easily devised here and in 
other cases.) 


6.12 Use of Higher Derivatives 


It is possible to derive a variety of formulas, for the numerical integration of 
differential equations, which involve values of certain higher derivatives of the 
unknown function. In particular, the Euler-Maclaurin sum formula (5.8.17) 
can be expressed in the form 


Ynt+1 — Yn-p =e W(4Vn+1 2s Vn = Yn-1 a asia 
+ Va pa a τ ay; 5) 


h? "f WF 
- 75 (ats py g) + ΤΣ τ ts — Yep 


h° 


v vi) +: 6.12.1 
πες ae yi») ( ) 


where the error committed by truncation with the term of order h** is (p + 1)h 
times the term of order h?**? with the contents of the relevant parentheses 
replaced by y?**5(é), where x,» < € < %n41. Thus, for example, with 
p = Owe have the formula 


2 
eet = Int 2 ει Ἐ 0 — τς Οἴει τ OF Ἐῶ (6122) 
of closed type, which may be used with any convenient predictor formula 
(preferably also with an error of order h>) as in the methods discussed previously. 
Formula (6.12.2) can be obtained also as a special case of the so-called 
Hermite interpolation formula (to be discussed in Sec. 8.2) and can also be 
derived by a method of undetermined coefficients, in which we write 


Vat. = Yn + πίαρ»ηι. + 1 Vn) + h?(BoYn+1 + Biy,) +E 


so that E = 0 if y(x) is a constant, and determine ἀρ, &;, Bo, and B, in such a 
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way that E = 0 also when y(x) = x, x”, x*, and x*, and hence for any poly- 
nomial of degree 4 or less. For this purpose, it is convenient and nonrestrictive to 
take h = 1 and x, = 0, so that the relevant equations become 


ἄρ +a, = 1 24) + 2(βο + Bi) = 1 
3a6 + 685 = ] 4x6 + 1285 1 


and yield % = a, = 4 and Bp = —f$, = --τς, in accordance with (6.12.2). 
The error term can then be determined by the methods of Secs. 5.10 and 5.11, 
if the formula is first rewritten in the equivalent form 


[ rds - Ξ[Ὁ) - FO] + [ΧἪὋ —f'D] + E 
where 
f(s) = yx, + sh) 


Reference to Sec. 5.11 then gives 


E = [ ms)f'[0,0,1,1,s]ds πῷ) = «1 -- 1)? 
0 


and, since z(s) does not change sign, there follows 
ν 1 
Ε - EAD | a6) ds = bof) = aah", + mh) = rob YO 
> J0 


where 0 < y < 1, and hence x, < € < x,4}. 
In the same way, a formula of open type, involving only y,41, 7η-.1» 
Yn-1> Vn» and y,_,, can be obtained in the form 


25: ἃ 
Yn+1 = Yn-1 + 28} γ,... zt - = τ en te Vand) oe 15 y(E) (6.12.3) 
and can be used as a predictor in connection with (6.12.2). Since (6.12.2) 
affords an accuracy which is generally better than that associated with the 
result of retaining fourth-order terms in the Taylor expansion 
h? h? h* 
n =y, thy, +—y,+— ae — Va + — y" 
Yati = Y Int TM t ey wie εἰ γῷ 
it is often useful in starting the solution when a procedure of fourth order is 
appropriate and when the calculation of values of y” and y’’ is to be avoided. 
The formula γ, © yo + hyo + 4h’y6 can be used for a prediction in the first 
step, after which (6.12.3) is available. 
When the differential equation is of second order, Eqs. (6.12.2) and 
(6.12.3) are to be supplemented by the two equations obtained by replacing y 
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by u, where u = γ΄. Formulas of higher-order accuracy may be obtained if 
derivatives of order three or more are also employed. 

A useful class of formulas, associated with the name of Obrechkoff 
[1942], can be derived by an inverse method in which we first seek a formula 
for [Ὁ (x) dx with an error expressed in the form 


= os | x(x — hyo?(x) dx (6.12.4) 
- JO 


where r is an arbitrarily prescribed integer. If we integrate by parts r times, there 
follows immediately 


~ ! [δ © Heth — 01 dx 
-ἰ, [ 6%) & [wth = xy] dx 


since the integrated terms vanish at both limits, and r additional integrations 
by parts yield the result 


3 (- 1)*- 1 a -- er [oe 1(h) 


+ (-D 168" Y0)]_—(6.12.5) 


@r Ys 


which supplies the required formula after a transposition. 
If we write (x) = y’(x), and translate the origin to x,, the result takes 
the form 


Vari = Vat = Qn)! ς 7» (—1)*"? τ - ἐπ τὰ τ yO, + (-1 ty] + £ 


(6.12.6) 


where, after an application of the second mean-value theorem, (6.12.4) becomes 


yr Ne) (* rd = r fie γ᾽ ἢ (2r+1) 
p= PO | vee - Was Sparel orl ὦ (612.7) 


with x, < € < Xy41- 
When r = 2, this result becomes identical with (6.12.2). However, when 
r = 3 we obtain the formula 


= y, + ὩΣ + Yn) — une - γ. 
Yn+1 Yn 2 n+1 n 10 n+1 n 


yg) (6.12.8) 


3 h 
-- (1 τὸ γὼ - 
“ἢ (πε. + Yn) τπατοὴ 
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which possesses obvious advantages (in general) over the corresponding 
formula 


YVaei = }) +4¢y ih = δ - Vn 
n+1 n 2 n+l n 12 n+1 n 


μ΄. : h? Ρ 
ΕἸΣ So) = yy" 6.12.9 
aa (Yatt — Vnd Ἤν (ῶ) ( ) 


obtained from (6.12.1). An appropriate predictor formula can be obtained 
in the form 


J ἢ 2h? iv it 
YVnti = Vn-1 + 2h(4y, — 3y,-1) — ἜΣ ἢ (8y, + 7Yn-1 


2h° 13h’ ,.. 
Ve — 3Yn-1) + —— Κ 6.12.10 
i Yn-1) ean (ἢ ( ) 
An infinite variety of other formulas can be derived by employing data 
relevant to more than two points for correction, and to more than three points 
for prediction. Thus, for example, the three-point formula of highest precision, 
using first and second derivatives, is readily found to be 


3h, , 
Yn+1 2Yn Ἔ Yn-1 = “gq net om Veni? 
8 


60480 


h? ” " 7] Viil 
= 54 nes — 8¥n + γᾳ-- 1) + γῇ (6.12.11) 
6.13 A Simple Runge-Kutta Method 


The methods associated with the names of Runge [1895], Kutta [1901], Heun 
[1900], and others as applied to the numerical solution of the problem 


y= Fy) γίχο) = Yo (6.13.1) 
effectively replace the result of truncating a Taylor-series expansion of the form 
h? he 
Yn+1 = Yn + AY, + τ τ Yn ἘΠ (6.3.2) 
by an approximation in which y,,, is calculated from a formula of the type 
Ynt+1 = Vn + μία. ΕΛ, Yn) + F(X, + yh, Yn Ἔ b,h) 


+ F(X, + Moh, Vy + bh) +++: + a, FOX, + Uh, y, + 6b,h)] (6.13.3) 
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Here the a’s, y’s, and b’s are so determined that, if the right-hand member of 
(6.13.3) were expanded in powers of the spacing h, the coefficients of a certain 
number of the leading terms would agree with the corresponding coefficients 
in (6.13.2). | 

They possess the advantages that they are self-starting but do not require 
the evaluation of derivatives of F(x, y) and hence can be used (even at the 
beginning of the solution) when F(x, y) is not given by an analytical expression, 
and also that a change in spacing is easily effected at any intermediate stage of the 
calculation. On the other hand, each step involves several evaluations of 
F(x, y), which may be excessively laborious and/or time-consuming, and also the 
estimation of errors is less simply accomplished than in the previously described 
methods. 

It is convenient, in order both to simplify the derivation and also to 
systematize the formulation, to express each of the b’s in (6.13.3) as a linear 
combination of the preceding values of F. Thus, in place of using the notation 
of (6.13.3), it is desirable to write the approximation in the form 


τι = Vn a Ako my ark, eee Ta A pk p (6.13.4) 
with 
Ko = hF (Xp; Yn) 


ky = AF(X, + μιῖ, Yn + Ar0ko) 
k, = hF (x, + ph, Dn + AroKo + Ark) (6.13.5) 


ς.Φ409 5 5 ὁ 6 5 4 eee 9 ® ὃ ὁ ὁ ὁ © 5 "99" ἢ ὁ ὁ ὁ PF Bs ee © 6 9 


Ky = hF (x, + Lh, Ve + λροζο + Aik + δ + Ay,p-1Kp-1) 


where the coefficients «;, 4;, and Ἅ;; are to be determined. 

Since the actual derivation of such formulas involves considerable algebraic 
manipulation, we consider in detail only the very simple case p = 1, which may 
serve to illustrate the procedure in the more general case. Thus, writing yu 
for μι, and A for 4,9, we proceed to determine ἀρ. «,, #, and A such that 


Yat+i1 = Yn + Ako + ak, (6.13.6) 
with 
Ko = μΕ(χ,- Yn) Κι = hF(x, + yh, ¥, + Ko) (6.13.7) 


possesses an expansion in powers of ἢ whose leading terms agree, insofar as 
possible, with the leading terms of (6.13.2). 
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We first obtain the expansion 
ALF + (uhF,, + AkoF,) 

+ (wh? F,, + 2pAhkoF,y + A?koFyy) + O(h*)] 
hF + h?(uF, + AFF,) 


ky 


3 
ὺ τ (WF + ΖμΑΡΕ, + 2°F*Fyy) + O(h*) (6.13.8) 


where F = F(x,, yn), Fy = F(X» Y,), and so forth. Hence (6.13.6) becomes 
Ynti = Yn + πίαρ + a,)F + ha, (uF, + AFF,) 


3 
᾿ : αι(μ Fee + ὭΜΆΡΡ, + °F *Fy) + O(n") (6.3.9) 


On the other hand, with the same abbreviated notation, we obtain from 
(6.13.1) the relations 


y =F 

γ ΞΘ ΞΕ; 

y” = F,, + 2FF,, + F’F,, + F(F, + FF,) (6.13.10) 
so that (6.13.2) becomes 


2 
Vn+1 = yy + WF + = (Fy + FF,) 


+ = ΓΕ + 2FF, + F’F, + FF, + FF,)] + O(h*) (6.13.11) 
Thus, if we identify the coefficients of HF, h*F,, and h’FF, in (6.13.9) and 
(6.13.11), we obtain the three conditions 
ἄρ +a, = 1 pa, = 4 λα, = 4 (6.13.12) 
involving the four adjustable parameters, which are satisfied if and only if 


1 1 
= 1 —¢ δὲ =Cc = — A=— 
' : ‘ 2c 2c 
where c is an arbitrary nonzero constant. The expansion (6.13.9) then reduces to 


h? 
Yn+1 = yee ee Gerry) 


h? 
a = (Εν + 2FF,, + F*F,y) + O(h*) (6.13.13) 


and reference to (6.13.10) shows that (6.13.13) or, equivalently, (6.13.6) would 
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then be brought into agreement with (6.13.11) or (6.13.2) if a truncation-error 
term of the form 


3 3 
T, = ἶ. — τ) [Fy Ἑ2ΡΕ, + F’F,,) + F,(F, + FF,)] 
ς 


3 
+ = FF, + FF,) + O(h*) 


or 
h? my " 4 
1. = om Ac [8 as Ac) yn _ 3F (Xp; Vw Vn ἘΝ Oh ) (6.13.14) 


were added to its right-hand member. 

The remaining free parameter c clearly cannot be determined so that 
T, is of order h*, except in trivial special cases. One convenient choice is c = }, 
in which case the second abscissa involved in (6.13.6) and (6.13.7) is x,44, 
and the formula becomes 


Yn+i = Yn + 4(Ko = ky) ot yo (6.13.15) 
with 


ko hF (x,> Vn) k, = hF (x, + h, Vu + ko) (6.13.16) 


where 


3 
T, = - Ly!” — 3Ε͵ α, γὼ» + O(h*) (6.13.17) 


Stepwise calculation based on this formula is sometimes known as Heun’s 
method. 
If, for all values of x and y involved in the calculation, it is known that 


F(x, y)| Ξ Καὶ (6.13.18) 


then, as in earlier developments, it is readily shown that the propagated error 
δ, in the nth step is dominated by the solution of the difference equation 


Cnt+1 = Cn + ae. ΕΗ ASG; + hKe,) + E 
2 2 
or 
2χκ»2 
ΓΞ (: + hK + ΕΣ Je, +E (6.13.19) 
where 


2 ΚΣ 
@=0 E = |T, + Ralmax hK + oe <1 (6.13.20) 
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Further, it can be shown that (6.13.17) can be replaced by 


μ' mr ut 
di — 12 [Ly (€;) = BF (Xn4 1 n)y (€2)] (6.13.21) 
where €, and €, are intermediate between x, and x, ;, and ἡ between y,,, 
and y, + hy,. Thus, if the roundoff error R, is ignored, and if 
IYO ΞΜ, ly" Ξ Μ, (6.13.22) 


it follows after a simple calculation that 
2 2y72\n 
pe EM) ΝΣ ahi RV 4 (6.13.23) 
12Κ( + 4hK) 2 


The formula (6.13.15), using (6.13.16), is of limited accuracy. Indeed, 
it can be considered to be a modification of the result of retaining only the 
first difference in (6.3.2) 


h , , h? Wt 
Yati = Yn Ἢ 2 ΘᾺ = )η..1) = 12 y (¢) (6.13.24) 


2 
in which the unknown derivative γί, = F(Xn+15 Yn+1) is replaced by the 
approximation Yai, © F(%n+1.¥n + hy). This consideration is useful in 
deriving (6.13.21). The details of the analysis were presented here principally 
to illustrate the similar but more complicated analysis relevant to formulas of 
higher-order accuracy, certain of which are listed in the following section. 

It is of some importance to notice that the error (6.13.21), associated with 
(6.13.15) and (6.13.16), depends upon the form of the function F(x, y) as well 
as upon the solution y itself. This situation is characteristic of formulas of the 
Runge-Kutta type. For example, whereas the equations y’ = 2(x + 1) and 
y’ = 2y/(x + 1) both define the function y = (x + 1)? when the condition 
y(0) = 1 is-imposed, the formula defined by (6.13.15) and (6.13.16) would 
yield this solution exactly when applied to the first equation, if no roundoffs 
were committed, but would not do so when applied to the second form. On 
the other hand, the formula (6.13.24) would yield exact results when applied 
to either form, or to any other first-order equation whose required solution 
is a polynomial of degree 2 or less (see also Milne [1950, 1970]). 

At the same time, the mere fact that (6.13.15), with (6.13.16), does not 
have this last property does not imply that its interpretation as a weakened 
modification of (6.13.24) is proper in the more general case when the true 
solution is not such a polynomial. For example, it is easily seen that the use of 
(6.13.15) and (6.13.16) would yield exact results when applied to the problem 
y’ = —y/(x + 1), (0) = 1, for which the solution is y = 1/(x + 1), whereas 
the use of (6.13.24) would lead only to an approximation. 
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6.14 Runge-Kutta Methods of Higher Order 


When ky, Κι, and k, are employed in (6.13.4), corresponding to p = 2, it is 
found that the requirement that the expansion of the right-hand member be 
correct through ἢ terms imposes only six conditions on the eight arbitrary 
parameters involved, so that a doubly infinite set of such formulas with third- 
order accuracy can be obtained. 

One such formula, due to Kutta, is of the form 


ἘΝῚ — Yn + ἐ(ζο + Ak, + kK.) + O(h*) (6.14.1) 
with 
ko = AF (Xn γὼ 
Κι = hF(x, + 4h, Yn + 3Ko) (6.14.2) 
ky = AF (xX, + hy Yn + 2ky — Ko) 
A second formula, due to Heun, is of the form 
Ynt1 = Vn + Ko + 3k2) + O(h*) (6.14.3) 
with 
Ko a hF (Xp, Yn) 
κι = AF (X_ + ἢν Yn + 3Ko) (6.14.4) 
ky = AF(X, + 3h, Yn + 3k1) 
These two formulas are generally of about equal accuracy, with each possessing 
certain obvious computational advantages. Kutta’s form is seen to be analogous 
to the formula of Simpson’s rule and would reduce to that formula if F were 
independent of y. 

It is also possible to derive a two-parameter family of formulas of fourth- 
order accuracy, by retaining an additional k in (6.13.4). The simplest such 
formula, due to Kutta, is of the form 

Your =a t Τρ + 2ky + 2k2 + kz) + o(h®) (6.14.5) 
with 
ko a hF (Xn, Vn) 
k, = hF(x, + 4h, Yn + 3%) 
Κι as hF(X, + sh, Vn + $k) 
ΚΞ -- hF (x, + h, Vn + Κι) 


(6.14.6) 


and would also reduce to Simpson’s rule if F were independent of y. 
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Such formulas can also be generalized to the treatment of simultaneous 
equations of the form 


ay ΞΞ F(x, ys u) 

dx 

" (6.14.7) 
<* = G(x, y, u) 

dx 


where y and uw are prescribed when x = xp. In particular, the preceding formula 
generalizes as follows: 


Yn+1 = Vn + (Ko + 2k, + 2k, + kz) + O(A*) 


(6.14.8) 
Un+1 = Uy + δύο + 2m, + 2m, + m3) + O(h*) 
with 
ko = AF (Xns Yas Up) 
ky = hF(Xq + th, Vy + ko, Uy + 4p) (6.14.9) 
kz = AF (%q + 4h, Yn + Hk, Uy + 4m) 
ΚΑ = AF(X, + hy Yq + ha, Uy + mz) 
and 
Mo = hG(Xp, Yns Un) 
my = hG(%, + 4h, Yq + ko, Uy + 4m) (6.14.10) 


"2 ἜΞΙ hG(x, + sh, Yn πῇ $k, ΩΝ Ts 5514) 
M3 ἘΞ hG(x, ἘΠ h, Vn + Kp, Uu, + M2) 
A consideration of this form indicates the way in which other formulas are so 
generalized. 
In particular, when F = u, so that (6.14.7) is equivalent to 
2 
5 = G(x, γ, γ) (6.14.11) 
dx 
with u = y’, (6.14.9) gives 


ko = hyn ky = hy + 5 moka = yg t my ey = hy + hy 
and hence (6.14.8) and (6.14.10) reduce to 


h 
Vn = Yn + h is + ae : 
᾿ δ ἐς eigas, 


Ynt+i = Vn + (my + 2m, + 2m, + m3) + O(h?) 
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with 
My = hG(Xq, »,» Yn) 
m, = σία, + th, Yn + SAY w 7+ 4m 
1 ( 2, y 2 ‘ y 2 oe (6.14.13) 
M2 Ξ- hG(x, + th, Yn + thy, ΞΕ zhm, Yn Ἐν 551.) 
m3 = hG(x, + h, y, + hy, + thmy, y, + m2) ᾿ 


The use of this formula is clearly simplified in those cases when G is independent 
of γ΄. 

Many variations and generalizations of these formulas are present in the 
literature, some of which afford certain computational advantages in certain 
situations. One such modification, due to Gill [1951], is of particular usefulness 
when the computation is to be effected by a computer in which it is desirable to 
minimize the storage of data. | 

No simple expressions are known for the precise truncation errors in the 
preceding formulas. An estimate of the error can be obtained, in practice, in 
the following way. Let the truncation error associated with a formula of rth- 
order accuracy, in progressing from the ordinate at x, to that atx,4, =X, +A, 
in a single step, be denoted by C,h"* 1 and suppose that C, varies slowly with 
n and is nearly independent of h when h is small. Then if the true ordinate at 
X,+1 iS denoted by Y,4,, the value obtained by two steps starting at x,-, by 
y?),, and the value obtained by a single step with doubled spacing 2h by γι 2"), 
there follows approximately 

γα. — Yue ® 2C,h"** 


2h) ~ ort +1 
Yu+1 ἐδ γ ~ 2" C,,h’ 


(6.14.14) 


when h is small. The result of eliminating C,, from these approximate relations 
is then the extrapolation formulat 
(hy νῶῃρ 
Υ κι © yo), + Zeer Pett (6.14.15) 
27-1 

Thus if, at certain stages of the advancing calculation, the newly calculated 
ordinate y,+, is recomputed from y,-1 with a doubled spacing, the truncation 
error in the originally calculated value is approximated by the result of dividing 
the difference between the two values by the factor 2 — 1, that is, by 3 in 
(6.13.15), by 7 in (6.14.1) or (6.14.3), and by 15 in the formulas of fourth- 
order accuracy. 

It is apparent that an arbitrary change in spacing can be introduced at any 
stage of the forward progress, when a method of the Runge-Kutta type is used, 
without introducing any appreciable complication. 


+ This is another example of so-called Richardson extrapolation (see Sec. 3.8). 
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6.15 Boundary-Value Problems 


Problems in which the conditions to be satisfied by the solution of a differential 
equation, of order two or greater, are specified at both ends of an interval in 
which the solution is required are known as boundary-value problems and are 
generally much less amenable to numerical analysis than are initial-value 
problems, in which all conditions are imposed at one point. In this section, we 
consider briefly the application of certain elementary methods to the numerical 
solution of such problems. More efficient methods often can be based upon 
the result of reformulating the problem as an integral equation or as a problem 
in the calculus of variations, the treatment of both of which falls outside the 
scope of this work. 

For a linear problem, such as one governed by a second-order equation 
of the form 


γ + P(x)y’ + O)y = F(x) (a<x<b) (6.15.1) 


and by the end conditions 
yaj=A y(b)=B (6.15.2) 


where A and B are prescribed, the analysis can be based on the principle of 
superposition. Thus, if u(x) is any solution of the equation 


u“" + Pu'+ Qu=F _ (6.15.3) 
which satisfies the initial condition 
u(a)= A (6.15.4) 


and v(x) is any nontrivial solution of the equation 


vo” + Po'+Qv=0 4(6.15.5) 
which satisfies the initial condition 
via) = 0 = (6.15.6) 
then the function 
Y(x) = u(x) + οὐχὶ (6.15.7) 


satisfies (6.15.1) and the condition y(a) = A for any constant value of c. F urther, 
if P, Q, and F are continuous on [a, b], there cannot exist additional functions 
having this property. Thus, a solution is found if c can be determined such that 


u(b) + cob) = Β (6.15.8) 


Unless c can be so determined, no solution exists. 

If P, Q, and F are continuous on [a, 5], the initial slopes μ΄ (α) and v'(a) 
can be chosen arbitrarily, so long as v'(a) # 0; the choices u'(a) = 0 and 
v'(a) = 1 are frequently convenient. It may be noticed that, if u(b) = 8 
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and v(b) = 0, then c is arbitrary and infinitely many solutions exist; if u(b) # B 
and v(b) = 0, then πὸ solution exists. Unless it happens that v(b) = 0, the 
solution exists uniquely. Any of the previously discussed methods can be used 
in determining u and v. 

In the case of a corresponding nonlinear problem, such as that governed 
by an equation of the more general form 


y"=Gxyy) (@<x<b) (6.15.9) 
and the end conditions 
ya)= Ἠ 4  y(b)=B (6.15.10) 


superposition generally is not valid. One possible procedure consists of defining 
u(x, «) as the solution of the initial-value problem 


u" = G(x, u, κ΄) 


(6.15.11) 
u(a) = A u'(a) = a 


and attempting to determine « such that 
u(b, α) = B (6.15.12) 


For this purpose, u(b) could be determined for two or more trial values of «. 
Then, by linear (or higher-order) inverse interpolation, an “improved”’ value 
of « would be obtained, and the process would be iterated until (6.15.12) is 
satisfactorily approximated if the iteration converges. 

This “shooting” process is apt to be tedious, and is complicated by the 
fact that (even in the linear case) small changes in « do not necessarily correspond 
to small changes in u(b, «). Further, the basic questions of existence and 
uniqueness of the solution are particularly troublesome in themselves, in the 
general nonlinear case. There exists no completely satisfactory general method 
(numerical or otherwise) for dealing with such problems. (Problems in which 
the end conditions prescribe y’ or a linear combination of y and y’, or which 
are expressed in a more complicated way, involve more or less obvious 
modifications. ) : 

Another class of methods, which usually are convenient only when the 
problem is Jinear, consists of approximating the differential equation by a 
difference equation and of solving the simultaneous set of algebraic equations 
resulting from the requirement that this equation be satisfied at each of a set of 
equally spaced points in the relevant interval. 

In illustration, any linear second-order equation can be transformed to an 
equation of the form 


γ' +fO)y = g(x) (6.15.13) 
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as was pointed out at the end of Sec. 6.10. Reference to Eq. (5.6.7) yields the 
relation | 
Yn+1 2s Τὶ Yn-1 = hci πὰ 720°)" + 1. (6.15.14) 


where the truncation error T,, is expressible in the form 
We x 
T, = -— y" Xe ee me Ke 6.15.15 
ὙΠ (ὃ ἀ6α,., «ς +i) ( ) 


Thus, if we use (6.15.13) to replace γ' by g, — f, Yas (6.15.14) takes the form 


h? 5h? h? 
(: πὰ 5 Fe] Yn+1 — 2(1 a ἼΣ 1) Ya Ἢ (: + at] Yn-1 


2 
= 2 Guts + 10dy + ys) + T, (6.15.16) 
Now, if the interval [a, b] is divided into N + 1 equal parts, in such a 
way that x) =a,x, =a+th,...,xy =a+ Nh, χνει = 6, where ἡ = 
(ὁ — a)/(N + 1), we may require the result of ignoring T, in (6.15.16) to hold 
for n = 1, 2,..., N, and so obtain a set of N simultaneous linear algebraic 
equations in γ}1; 2,..., yy, of the form 


h2 5h? h? h? 
ta ey 7". ca δ ὁ 
( 5/2) ( D fi) +( =P 32) Dp”! 


h? 5h? h? h2 
1+ — =O de +(1+— =—G 
( D fi) V1 ( D 4) y2 ( 15 fi) y3 ID 2 


a  ο. » οΦς οοοοοοοεοοςος»οΦοοδοοοφοοοοοοοοονου ρου φο 


h? 5h? 
(1 + 5 fs) Yn-1 — 2(1 = 1D fu) YN 


μ2 h? 
+714 — =— G 6.15.17 
( 15 fuss) YN+1 D ν ( ) 


where 
G(x) = g(x + h) + 10g@x) + g(x -- A) 


supplemented by the prescribed conditions yy = A and yy,, = 8. 

By virtue of (6.15.15), this procedure is of fifth order, in the sense that 
it would afford exact results if y(x) were a polynomial of degree 5 or less. 
A simpler procedure, of third order, corresponds to the neglect of second 
differences in (6.15.14) and hence consists of solving the simultaneous equations 


2 
Yn+1 — 2(1 ae 1) Yn + )κ5.--ι = hg, (n = 1, 2. “5. N) (6.15.18) 


supplemented by the boundary conditions. 
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On the other hand, if fourth differences are retained in (5.6.7), the cor- 
responding equations are easily obtained in the form 


Ξ h? 97h? 
—— fasrzVng42 HII tosh oS 
54g Tete) +2 ( ss tue) 2 +1 ( 540 Π)» 


τὰ i, y ane y 
10 π-1 π-1 240 n—24n—2 


h? 
= 540 (—9n+2 7 24g n+1 ok 194g, + 249n-1 ac In-2) (6.15.19) 


and would reduce to identities forn = 2, 3,..., N — lif y(x) werea polynomial 
of degree 7 or less. For n = 1 and n = N, Eq. (6.15.19) would involve the 
irrelevant quantities y_, and yy, . Two additional “off-center” relations, which 
would also be satisfied exactly by any polynomial solution of degree 7 or 
less, are thus needed. They may be obtained, for example, by retaining fifth 
differences in the backward-difference formula (5.5.12), relative to V7yy41, and 
in the corresponding forward-difference formula relative to A*y,, in the formst 


3h? 209h2 h 
(: + Fh fo) ve - 2(1 - 2 fi) +(1 + Bh) 


Th? h? h? 
+ - -- + — 
120 S3y3 40 SaYa 540 Ss Ys 


2 
= ἐπ (189. + 2099, + 492 + 1493 -- 694 Ὁ gs) (6.15.20) 
and 


h? h? Th? h2 
— fy—aYy- 4 7G Ie 3 Yn- 3 Ὁ Τῇ 50 JN 2} ν-- + Lt τὸ Ἴνα Yn-1 


240 
209h? 3 
-2(1 aad Reed +E fas) en 


= — (gy_4 -- 6gy-3 + 14gn—2 + 46ν-ι + 209G9n + 189ν.1) (6.15.21) 


The N — 2 equations (6.15.19), the two equations (6.15.20) and (6.15.21), 
and the two prescribed end conditions serve to determine (approximately) 
the values of the N + 2 ordinates yo, y1;,.-->¥n» νει" Here the interval 
[a, b] must be divided into at least five equal parts. | 


+ The same relations can be obtained by using the approximate relations 
A®y”, = A®(g_4 — f_1y-1) = 9 Voyn+2 = V9(gn+2 — fn+2n+2) = 9 


to eliminate the ordinates y_, and γν. 2 from the equations which correspond to 
setting n = 1 and zn = N in (6.15.19). 
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If the prescribed end condition at x = a = Xo involves y’(a) in place 
of (or in combination with) y(a), that condition can be replaced by an ap- 
propriate approximate one, involving γὸο and γι, by means of the result of 
retaining terms involving powers of ἢ through h” (where n is the order of the 
procedure used) in the expansion 


= 1 2172 1 373 Ἵ sas ns 


combined with (6.15.13) to give 
h? h? , ’ 
γι — Yo = hyo + 5 (9 — foYo) + ra (Jo — foo — Soyo) 


μ΄ LA WY / ? 
Ἴ 54 (90 -- 7059). -- ΖΏ ὁ -- δθο Ὁ [ἦνο) Ὁ 
or 


h? hes. Biss 
near -Fr-ER- EUs — fe) 


h? 2 ; 
— =f - Ah) — » | 


h* 


δ πὴ ,-* gp -* oy - 1) --- 
Yo 6 “9 1579 150. 9 0 


h2 h3 h* 
+/—-go+—9+—(9%,- 
E Jo 6 Jo A (go — 2990) 


h° id f f 
+ = (08 — ϑῦσο -- fo9') + Ὁ] (6.15.22) 


A similar relation, for use at x = b = xy,,, is obtainable from the expansion 


1 1 I 
Vyw41 = μων = 5: μ2}}2 + 31 "Ὁ" = Al h*p* + Ὁ Yn+1 


6.16 Linear Characteristic-Value Problems 


When g(x) = 0 in (6.15.13), and the prescribed end conditions are of the special 
form y(a) = y(b) = 0, one solution of the problem is Clearly the trivial solution 
yx) = 0. It frequently happens that an arbitrary constant parameter A is 
linearly involved in the definition of the function f (x) and that it is then desired 
to determine values of A for which the problem also admits a nontrivial solution. 
Such values of Ἢ are known as its characteristic values (or eigenvalues), and the 
corresponding solutions are called the characteristic functions of the problem. 
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The study of their properties and applications comprises an important field of 
mathematics, and a great variety of methods have been (and are being) devised 
for their approximate numerical treatment. 

One such method can be based on the result of appropriately specializing 
(6.15.17). Thus, if the problem is of the form 


γ" + [q(x) + Ar@®]y = 0 
y(a) = y(b) = 0 


we may replace f by 4 + Ar in (6.15.17), to obtain a set of N equations of the 
form 


5h? 5h? 
—2i{1—-—-— ἘΘῈ es Z 
( D a) 12 "] γι 


(6.16.1) 


+ 
i | 
πὴ -Ὗ 

— 

. 
ἀπὸ ae 
bho N 

SQ 

eo 
Ne” 

-Ἕ 
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1% 

So 
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© 


Ge ie aes ὦ ἢ κου φι Rak Se Se ὁ ὁ κι ν BOR ae OOO Re) Re BLAIS IR SOE RRS RE ES δ SS 


h? h? 
{ a ΤΩ av-) ἐΞ D γα] Yn-1 


5h? 5h? 
— 21/1 - - ὀ-- een | = 0 6.16.2 
( 12 an) 12 rs YN ( ) 


This set of homogeneous linear equations will admit a nontrivial solution 
for Yi, 725». ..Ὁ Yn if and only if the determinant of the matrix of coefficients 
vanishes (see Sec. 10.2), a requirement which demands that λ be a root of an 
algebraic equation of degree N if no one of the values of 7, vanishes, as 15 
generally the case in practice. For each such value of J, this set of equations 
becomes redundant and (at least) one equation can be ignored, after which the 
remaining equations can be solved for the ratios of certain of the ordinates to 
the remaining one or ones. Except in unusual cases, only one of the ordinates 
(but, usually, any one) can be chosen arbitrarily, and the ratios of the remaining 
ones to that one are determinate. In this way, approximations to N of the 
characteristic numbers (generally the N smallest ones) of the true problem are 
obtained, together with ordinates of the corresponding characteristic functions, 
defined within a common arbitrary multiplicative factor. 
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The crudest approximation is obtained by taking N = 1, so that only 
the central ordinate y,, at x = (a + 5)/2, is involved, and yy = y, = 0. 


Thus only the equation 
5h? 5h? 
--2[{{1 — — — ——aAr = 0 
[ι- 35) - Hans 
is obtained, and the requirement y, # 0 leads to the approximation 


2 — 
Fg τ, EN ἣν i (6.16.3) 
Shr, 12 2 


to the smallest characteristic number 1,. The ordinate y, is then indeterminate. 
When N = 2, the two permissible values of A are found to be the roots 
of the determinantal equation 


5h? Sh? h μ2 
Ofte ων 95 ἃ απ 
μ2 h? 5h? 5h? ~ 
Pos ny -55. is δεν 95 
( Τ ΤΩ 4) τ α] ( 12 a) 12 5] 


where h = (ὁ — a)/3, and may be denoted by 2 and 4%. For each of these 
calculated values of 4, there follows also, from the first equation, 


(6.16.4) 


Vey US 5h7q,/12) — (5h?r,/12)A 
γι (1 + h?q,/12) + (h?r,/12)A 
with y, arbitrary. The use of the second equation would lead to an equivalent 


result. 
In illustration, the problem 


¥Y"+2Y’+4x¥Y=0 YOQ=Y(1I)=0 (6.166) 


(6.16.5) 


is transformed to the problem 
yo-ytdaxy=0 yO=y(1)=0 (6.16.7) 
with the change of variables 
Y=e *y (6.16.8) 


in accordance with (6.10.23). With g(x) = —1 and r(x) = x, Eq. (6.16.3) 
yields 


AY == 956 + 5) = 196 = 21.2 
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Equation (6.16.4) becomes 
_9 339 — 5A 321 + 2A 
324 324 — 0 


321 +A _» (339 — 104 
324 324 


and expands into the relevant characteristic equation 


2247 — 23671 + 39627 = 0 


to yield a second approximation λ(2 = 20.74 to A, and a first approximation 
AS) = 89.38 to A,. Equation (6.16.5) becomes 


γχ 678 — 10A 
y, 321 + 2A 


and yields y,/y, -- 1.30 for A and γφί γι = —0.432 for A’. Thus, from 
(6.16.8) there follows Y(3)/Y(4) ~ 0.93 in the first ‘““mode” and —0.31 in the 
second one. | 

If three interior ordinates were used, to afford improved approximations 
to A, and /,, and a first approximation to J, it would seem to be necessary to 
expand a determinant of third order and to determine the roots of a cubic 
equation. However, various iterative techniques for determining the roots of the 
relevant characteristic equation without explicitly expanding the determinant, 
in such cases, exist in the literature (see Sec. 6.18). 

A simpler procedure would be based on the use of (6.15.18), rather than 
(6.15.16), whereas a more elaborate procedure could be based on (6.15.19) 
to (6.15.21). Modifications, which are appropriate to situations when a linear 
combination of y and y’ is required to vanish at each end of the interval, may 
be based on the use of (6.15.22) and a similar equation relevant to x = b, 
with f ᾳ + drand g = 0, in place of the conditions yo = yv+, = 0. 

In the simple special case when g(x) = 0 and r(x) = 1, so that (6.16.1) 
reduces to y” + Ay = 0, where y(a) = γ(δ) = 0, the exact value of the rth 
characteristic value is easily found to be 


rn? 


4,=——, 
(b — a)’ 


The approximation 4, afforded by use of the simpler procedure is found to be 
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whereas that afforded by use of (6.16.2) can be shown to be 


1. — 4 __ sin? [hy] 
"A? 4 — 4 sin? [(h/2)V2,] 


where ἢ = (ὁ — a)/(N + 1), from which results the nature of the approxima- 
tions to the first N characteristic numbers can be determined in the two cases. 
In particular, the error in the former case is found to be positive and less than 
h*1?/12, whereas that in the latter case is positive and less than h*A3/240. 
The error associated with the use of the more elaborate seventh-order procedure 
would be of the order of h°A*. These facts permit crude preliminary estimates 
of the requisite number of subdivisions in similar (but less simple) cases. 


6.17 Selection of a Method 


Whereas a rather large number of methods for dealing with initial-value prob- 
lems have been outlined in this chapter, it should be remarked that a very 
substantial number of additional variations may also be found in the literature. 
The problem of deciding which one of these methods is most appropriate in a 
specific situation is rather difficult to discuss because of the large number of 
factors which may affect the decision. 

First of all, the choice will depend upon the nature of the computational 
device to be used. Thus, for example, a method which is well adapted to the use 
of a desk calculator may be inconvenient for hand calculation because of the 
fact that it involves too many operations with multidigit numbers; or a procedure 
which involves a large number of iterations ofa relatively simple technique 
may be remarkably well adapted to an automatic high-speed calculator, but its 
use may entail a prohibitive amount of time when the calculations are to be 
made on a desk calculator. 

A procedure which involves a large number of evaluations of a certain 
function F(x, y) may not be objectionable for desk calculation if F(x, y) is 
well tabulated but may require an excessive amount of time in machine calcula- 
tion. On the other hand, the situation may be completely reversed if the function 
is of complicated analytical form but if a subroutine is available for generating 
it directly in the computer. 3 

Stability considerations may be of great importance, for a given problem, 
when a large number of steps is to be taken, but may be much less significant 
when only a relatively small number of ordinates is required, or when a different 
problem is dealt with. The computational advantages associated with a simple 
procedure with relatively large truncation error must be weighed against the 
fact that the use of that procedure generally requires a small spacing and a 
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correspondingly large number of steps, and hence increases the importance of 
the effects of roundoff errors. At the same time, such a procedure may be 
preferred to a more elaborate one when its use permits a fairly confident estimate 
of an upper bound on the total propagated error, whereas such an estimate 
corresponding to the use of the more elaborate procedure is not readily available. 

One may be faced with the problem of choosing the procedure which is 
most appropriate in the solution of a single differential equation, or that which 
appears to be best on the average for a wide class of equations. 

The methods described in this chapter, for advancing the solution, fall 
into three broad classes: (1) methods which express the future ordinate as a 
linear combination of present and/or past ordinates and slopes (Secs. 6.5, 6.6, 
6.9, and 6.10); (2) methods which also involve the calculation of certain higher 
derivatives (Sec. 6.12); and (3) methods in which the determination of the 
future ordinate does not involve memory of the past (Secs. 6.13 and 6.14). 

The Euler procedure and its modification of closed type are the simplest 
of the procedures in the first class. The modified Adams method and the special 
fourth-order Milne and Hamming methods are perhaps the most frequently 
used procedures of higher-order accuracy in this class. In each case, the relevant 
formulas for prediction and correction may be expressed either in terms of 
differences of slopes or in terms of the slopes themselves. The Milne procedure 
requires fewer values of the derivative than the corresponding Adams procedure, 
but it compares unfavorably with the latter and with Hamming’s method from 
the point of view of stability. Except for the simple Euler methods, the pro- 
cedures in this class are not self-starting. 

The methods of the second class are highly efficient when and only 
when the differential equation is of such a form that analytical relations between 
higher derivatives of the unknown function and the function itself are readily 
obtained. If derivatives at only two points are used, the special methods actually 
treated here are effectively self-starting. 

The Runge-Kutta methods, of the third class, possess the advantage that, 
since their use at each stage of the advancing calculation does not require 
information relevant to past stages, they are completely self-starting and are 
particularly appropriate when memory requirements are to be minimized. 
Furthermore, these procedures are inherently stable and are such that a change 
in spacing is easily effected at any stage of the advance. The principal disad- 
vantage consists of the fact that each forward step entails several evaluations of 
the “right-hand member” of the differential equation, a fact which may be of 
considerable importance when each such evaluation is difficult or time-consum- 
ing. In addition, the nonexistence of a tractable expression for the associated 
truncation error, in most cases, is a source of some inconvenience. 
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For the purpose of starting a solution, when a method of the first class 
is to be used for advancing the solution, one may choose among the methods of 
the second and third classes as well as the methods of Sec. 6.4. When the right- 
hand member of the differential equation is of such a form that the formation of 
higher derivatives is readily effected, the use of Taylor series (Sec. 6.4) or of 
series which involve values of higher derivatives at two points (Sec. 6.12) is 
often convenient. Otherwise, resort may be had to one of the iterative methods 
of Sec. 6.4 or to the methods of Runge-Kutta type (Secs. 6.13 and 6.14). 


6.18 Supplementary References 


General texts on the numerical solution of ordinary differential equations include 
Milne [1970], Collatz [1960], Fox [1962], Henrici [1962, 1963], Babuska, 
Prager, and Vitasek [1966], and Lapidus and Seinfeld [1971], each of which 
provides a useful bibliography. For theoretical considerations and analytical 
methods, see texts such as Ince [1952], Coddington and Levinson [1955], 
and Birkhoff and Rota [1969]. Daniel and Moore [1970] combine numerical 
and analytical considerations in a rather unusual way. 

The basic papers on the stability of step-by-step processes are those of 
Rutishauser [1952] and Dahlquist [1956]. For references to later contributions, 
see the texts listed above. Roundoff-error propagation is studied in Henrici 
[1962]. 

Numerical techniques for dealing with boundary-value problems governed 
by ordinary differential equations are dealt with in Fox [1957] and Keller 
[1968]. 


PROBLEMS 


Section 6.2 


1 Show that the operator affecting γ΄ in the open formula (6.2.12) relating y,, , and 
Yn—p Can be obtained by multiplying the one corresponding to p = 0 in (6.2.10) by 


1-α -ῦ spe ya 2 Pe a... 
V 2! 
and use this method to derive (6.2.13) to (6.2.15), as well as the formulas corre- 
sponding to p = 2 and p = 4. 
2 Verify that the results of terminating (6.2.13) to (6.2.15) with the zeroth, second, 
and fourth differences, respectively, are Newton-Cotes formulas of open type. 
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Section 6.3 


3 


4 


Show that the method used in Prob. 1 also applies to the closed formulas, and 
derive the formulas relating y,,, and y,—» from (6.3.2) for p = 1, 2, 3, 4, and 5 
in this way. 

Verify that the results of terminating (6.3.3) and (6.3.4) with the second and fourth 
differences, respectively, are Newton-Cotes formulas of closed type. 


Section 6.4 


5 


ἊΣ 


Obtain additional values of y, corresponding to x = +0.1 and +0.2, for each of 
the following problems, by use of power series, rounding the results to five 
decimal places: 

(a) γ΄ +y = 0, x0) = 1 

(b) γ΄ + 2xy = 2x3, γ(0) = 0 

(ὦ γ +y + xy? = 0, 0) = 1 

(d) xy =1-—y+ x*y’, yO) = 1 

[The respective analytical solutions are e~*, eo aL x?, (2e* -- 1 — x)7}, 
and (tan x)/x. | 

(a) to (d) Proceed as in Prob. 5 by Picard’s method. 

(a) to (d) Proceed as in Prob. 5 by use of Eqs. (6.4.14). 

Obtain additional starting values of y when x = 0.1, 0.2, 0.3, and 0.4 for the 
following problem, by use of Eqs. (6.4.13), assuming that only the tabulated values 
of ¢(x) are available and rounding the results to five decimal places: 


y+txu=¢) yO=1 


x 9(x) x φ(χ) 
0.0 1.00000 0.6 1.16412 
0.1 1.00499 0.7 1.21579 
0.2 1.01980 0.8 1.27059 
0.3 1.04399 0.9 1.32660 
0.4 1.07683 1.0 1.38177 
0.5 1.11730 


[Here ¢(x) % cos x + x sin x, and hence y(x) ~ e~/2 + sin x.] 


Section 6.5 


9 


10 
1 


(a) to (4) Advance the calculation of the solutions of Prob. 5 to x = 1 with 
h = 0.1 by use of the Adams method, rounding all calculated ordinates to five 
decimal places and estimating the errors. 

(a) to (d) Proceed as in Prob. 9, using Eq. (6.2.18). 

Advance the calculations of Prob. 8 to x = 1 with A = 0.1, @) by use of the 
Adams method and (δ) by use of (6.2.18), rounding all calculated ordinates to 
five decimal places and estimating the errors. 
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Section 6.6 


12 


13 


(a) to (6) Recalculate the ordinates required in Probs. 9a to ὦ and 11 by use 
of the modified Adams method, again retaining five places and estimating the 
errors. 

(a) to (e) Proceed as in Prob. 12 by use of Milne’s method. 


Section 6.7 


14 


15 


16 


17 


Show that the approximate solution of the problem 
y+ty=0 yO=1 


afforded by the result of retaining first differences in the formula (6.3.2) would be 


of the form 
2 — h\* 
(:.1ς  " 
7 ( ἽἼ- 7) 


if no roundoffs were effected, and that corresponding to retention of first differ- 
ences in (6.3.3) would be of the form 


y = eee (V1 +h + At yp)V1 + κ — μ)" 
2V1 + μ2 


+ (-1IV1 + 2 — h— y V1 + Ff? 4+ 2)" 


where y, is the independently calculated approximation to the true value e~", 
whereas the exact solution is given by y, = e~™. 
Show that, when ἡ is small, the solutions obtained in Prob. 14 can be expressed 
in the forms 

γε = (6“" — yeh? Ὁ ...}" 


and 
Yo = [A -- teh? + ...) ~ Fo -- HP + em - WP Ὁ ...}} 
+ (-1)"[Gsh? +--+) τ de(1 — 1}2 Ὁ ++ )](e" - ἐμ5 - ...}} 


where δ represents the error associated with the value employed for y,, and where 
omitted terms in each expansion are small, of order h*. 

Suppose that a spacing ἢ = 0.1 is used in the approximations of Prob. 14, and 
that the value of y, used in the second calculation is assumed to be free of error. 
Calculate the errors and relative errors in the two approximations for values of n 
in the neighborhood of 10, 50, and 100, neglecting the effects of roundoff errors. 
Show that when Milne’s method is used, the parasitic part u, = u(x,) of the 
approximate solution of the problem 


y =Ay yO) = γο 
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18 


19 


where A is a constant, is approximated by 
525 
u, ~ (-1)" Ξ i 


360 


Yo + μ ΣΝ 


when |AA| is small, where ¢ is the error inherent in γι. 
For Hamming’s corrector (6.7.20), show that the characteristic equation corre- 
sponding to (6.7.6) is 


(1 — 3AA)B? — ὦ + ξ4|0)β5 + ZAMB + ἢ = 


Verify that when ἢ = 0 the roots are 1 and (1 + J 33)/ 16. Then show that the 
Hamming simulation to the solution of (6.7.3) is approximately of the form 


Yn = Col + Ah + +++)" + €4(0.4216 — 0,00804h + ---)" 
+ €,(—0.2966 + 0.17984h +--+)" 


when A is small, where the c’s are determined by starting values and are such that 
c, and 62 are small relative to cg. Thus deduce that the parasitic solutions cannot 
be troublesome when Ah is of reasonably small magnitude and hence that Ham- 
ming’s method displays short-range stability whether 0F/¢y is positive or negative 
unless / is abnormally large. 

(a) to (e) Proceed as in Prob. 12 by use of Hamming’s method. 


Section 6.8 


20 If the formula 


21 


Yne1 = Yn + Πία. ,γ}έῴάμει + ἀρ} ) 


is used for the numerical solution of the problem 


y = Fx, y) vX%Xo) = Yo 


if a_, 2 0, a = O, if |F,(x, y)| Ξ Καὶ throughout the calculation leading to y,, 


and if Kha_, < 1, show that the error ¢, in y, is bounded by the inequality 


n 
le,| = Ε A+ Khag —~1lx ΕΖ (e"Kh _ 1) 
Kh |\1 — Kha_, Kh 


where E is the largest error introduced in a single step. Also specialize to the cases 
(a_, = 0,4 = 1), (a_; = 1, ἀρ = 0), and (@_, = ἀρ = 4), showing that E 
can be taken to be 442M, + R in the first two cases and to be 7'sh°M3 + Rin 
the third case, where M, is the maximum value of | y(x)| for x9 S x S x,, and 
where R is the maximum roundoff error introduced in a single step. 

Suppose that F,(x, y) is known to be negative throughout the calculation con- 
sidered in Prob. 20, and also that 


0<a Ss -ΕΕἰισ, y) 


22 


23 
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where ὦ is a constant. Show that then ¢, is dominated by e,, where 
(1 + hoa_er41 = A -- hoae, + E (n = 0,1, 2,...) 
and ἐὺ = 0, if hway < 1, and deduce the more useful bound 
ΒΕ n 
le] S — [1 — (2—*%)"| e Ze — enrol 
coh 1+ hoa, ch 


in this case. 
Suppose that the formula of the Adams method, written in the form 


r 
Yn. = Yn + hoy Vag, + hogy, + 2 ἄχ Va—k 


is applied to the problem 
y = Fx, y) ~=y(%o) = Yo 
where it is known that 
O0<wo5 -F,O,y SK 


where ὦ and K are constants, and assume also that a = 0, «_, = 0, and 
hoa, < 1. Show that, if the maximum error introduced in a single step is E, then 
the error δ, in y, is dominated by e,, where 


(1 + hoa .)6,..1 = (1 — heage, + hK > joxle,, + E 
κξι 


if |e,| S e, fork = Ο, 1, 2,..., r. Show that one solution of this equation is of 
the form 
E n 
e, = ——C 
ΜῈ Bo 
where 


ὃ = ao(a_, + Ao) = ΚΣ [ακ 
=1 
that it is possible to take By such that Ὁ < By < 1 if δ > 0, and that then 
E 
é,| S δβ6 "°+—d -- ge 
len] S 286 ; ra Bo) 


where @ is the absolute value of the largest of the errors δον Sige nes δὲ; 
Show that the absolute values of the errors in the calculations of Prob. 14 are 
dominated by 


CAB Gai] (eee 
12 Ah 2+h 12 ἢ 


2 —— er toe Sere a 
(: + 7) [ΜῈ + A? +h)" — 1] + [δ (ΜῈ + A? + Ay" 


and 


2 
~ (5 + i) (e™ — 1) + |ele™ 
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respectively, where R is the magnitude of the maximum roundoff error introduced 
in a single step and ε is the inherent error in y,, and compare these bounds with the 
direct error calculations of Prob. 16. (Use the results of Prob. 21 in the first case.) 

24,25 (a) to (6) Calculate values of F, at appropriate stages of the calculations of 
Probs. 12 and 13, and obtain corresponding approximate error bounds, considering 
separately the effects of truncation and roundoff errors and using the result of 
Prob. 22 when it is appropriate. Estimate the truncation error in each step by 
approximating h”"y* by h V"—+y!, in the appropriate error term. 

26 Use any numerical step-by-step method for the calculation of approximate 
values of the solution of each of the following problems for x = 0.0(0.1)1.0, with 
an error which can be reasonably confidently expected to be less than one unit in 
the fifth decimal place: 


(a) y =x — y*, v0) = 1 
(δὴ) γ΄ =x + sin y, (0) = 2/2 
(c) γ΄ = e*, yO) = 1 


Section 6.9 


27 Suppose that closed formulas of the type (6.9.3) and (6.9.4) are used for the 
numerical integration of (6.9.1), that the relevant truncation errors in the nth 
step are E, and E;, respectively, and that another pair of formulas of open type is 
used for prediction of y,, 1 and uv, 1. with truncation errors E, and ΕἸ, respectively. 
If the predicted values are denoted by γί), and u{°,, the corrected values by 
Yn+1 and u,,,, and the true values by Y,4, and U,41, and if the notation 


Ε2-- Ε, ¢ Ἐ2- Εἰ 
E, Ε; 
is introduced, show that there follows 
Υ πα τ Yee 1 = —(C — 1)E, 
γα — Yara = πα. (μαι — Ungi) + E2 
On+1 - Unt = -(C’ — 1)E} 
Ungar — Uner = hay (G,y,, (Masa το Ynaa) + Gy, (Una. - Unga] + EQ 


where G, ,, and G,,,, are appropriate values of G, and G,,, respectively. By 


eliminating E, and ΕΖ, express Th41 = Yaui — Ynti1 and Tyy1 = ὕει τ μα. 
. : . . τς 0 
as linear combinations of yni1 = μαι -- YO, and yin, = Uny1 — uO, Thus 


show that if 


= C’ 


h\a_;|[hlo_i||G,,,,| + IGy, 4.1] «K 1 


so that also the convergence factor ρ,.ᾳ 1 is such that |p,,,| « 1, and if it is true 
that C’ ~ C > 1, then the approximations 


1 ; , 1 Ly 
Th+1 Ὁ - rae az ha_1¥n+1) Tn+1 “ - G (Matt + Πα. Gy, , Yn+1) 
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generally provide better estimates than the usual simpler approximations 
Tn+1 © —Yn4i1/C and Ty. % —y7/,,1/C. In particular, obtain the estimates 
Trai Ὁ —60ne1 + 3674..1) Tit © —8ne1 + Povnes) 


for the calculations of the illustrative example of Sec. 6.9. 

28 Obtain approximate values of the solution of each of the following problems for 
x = 0.0(0.1)1.0, determining appropriate starting values by power-series methods 
or otherwise, and proceeding by use of the modified Adams method, retaining only 
first differences, estimating the errors introduced in each step, and retaining an 
appropriate number of decimal places in the calculations: 


(a) γ΄ — y = 0, yO) = 1, YO) = -1 

(δ) γ" + 2y’ + 2y = 0, yO) = 1, y'(0) = --Ἱ 
(ὦ xy” + y’ + xy = 0, γ(0) = 1, yO) = 0 
(4) γ" + y + y? = x, yO) = 1, yO) = 0 


(e) uw=x+u-— υ, μ(0) τ 0 
v = χὲ —~ v + μ2, υ(0) = 1 


I 


I 


29 (a) to (6) Repeat the calculations of Prob. 28, retaining differences through the 
third. 

30 (a) to (6) Use Milne’s method in Prob. 28. 

31 (a) to (6) Use Hamming’s method in Prob. 28. 


Section 6.10 


32 Suppose that the formula (6.10.3) is used with no differences to generate an 
approximation to e~* as the solution of the problem 


γεν yO)=1 yO)=-1 


with spacing ἢ, and that the value used for y, is in error by 8, so that y, = ε΄" — εξ. 
Show that, if all subsequent calculations were effected without roundoff, then the 
nth calculated ordinate would be given exactly by 


1-h?/ 1 = ae 
= χε) = a el eT + e}e" og (1+h) 
al ἀμ alae" (; ~h 
= h? Pt ce 67" .. εὴ etiog[1/1-A)] 
2h 1+h 


where x, = nh, and that the approximation 


h 8 h δ 
χ) α{1τ-ῳὸ- 6. (Ξ + =) em 
Yn) ( 4 a) ¢ ( a) 
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33 


34 
35 


36 


37 


would hold when A is small and n large, so that the relative error in y(x,) would 
then be approximated by (h? + 2e)e?*"/(4h). Show also that the corresponding 
relative error in the approximation to e*, with the modified condition γ(0) = +1, 
would be approximated by the constant (h? + 2e)/(4h) when ἡ 15 small and n large, 
again neglecting roundoffs. 

Obtain approximate values of the solution of each of the following problems for 
x = 0.0(0.1)1.0, using the Milne procedure (6.10.5) and (6.10.6), and estimating 
the error introduced in each step: 


(a) γ" —y = 0, x0) = 1, yO) = --Ἰ 

(δὴ) y” + xy = 0, x0) = 0, yO) = 1 

(ὦ y” + xy + exy® = 0, yO) = 0, (0) = 1 

(4) y’ + siny = x, γ(0) = 2/2, yO) = 0 

(a) to (d) Obtain approximate error bounds for the calculations of Prob. 33. 
Obtain approximate values of the solution of each of the following problems at 
the points noted, using the Milne procedure (6.10.5) and (6.10.6) after introducing 
the transformation (6.10.23), and estimating the error introduced in each step: 
(a) xY” + Y’ + ΧΥ͂ = 0, Y(1) = 0.76520, Y’(1) = — 0.44005 [x = 1.0(0.1)2.0] 
(b) Y” + 2Υ’ + x2¥ = 0, YO) = 0, Y’(0) = 1 [x = 0.000.1)1.0] 

(c) Y’ + 2Υ’ + xY¥? = 0, YO) Ξ 1, YO = -1 k= 0.0(0.1)1.0]} 

Verify that the substitution 


Y= py p= e (1/2) § Pdx 


reduces the equation 
Y” + P(x) Y’ = H(i, Y) 
to the form 


P 1 
γ" = 4(2P’ + P*)y + ae py) 


which is a special case of (6.10.1), and verify also that this transformation includes 
the reduction of (6.10.22) to (6.10.25) when Ἢ is linear in Y. 
Show that the equation y” + f(x)y = 0 is satisfied by 
yx) = A(x) cos Ox) 
where 
x 
O(x) = [ v(t) dt + ὦ 
xo 
if A and v satisfy the equations A” — Av? + fA = 0 and 2A’v + Av’ = Ο, or 
hence if 


c2 
A’ + fA=—, ve θ' -- --- 
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where c and ὦ are arbitrary constants. Show also that the conditions 
A(xXo) = Ap A(X) = 90 A%(Xq) = 0 


which tend to require that A(x) remain constant near x = Xo, are consistent with 
the conditions y(xo) = Yo and y’(xo) = yg if Ap and ὦ satisfy the relations 


4 
Ajcosm@=y, Agsineg = — 2% 
μ᾽" 


and if 
c= £843 


under the assumption that fg = f(x) > 0. {This procedure, attributed to Made- 
lung [1931], is often useful when f(x) is large and positive, so that y(x) is strongly 
oscillatory, since A(x) often varies much less rapidly. A similar transformation, 
which is often useful when f(x) is large and negative, and y(x) increases or decreases 
rapidly, may be obtained analogously by replacing cos 0 by cosh 9, sinh 0, or e® 
in the expression assumed for y, according as the ratio of [γ0] to |y|/(—fp)!/? is 
greater than, less than, or equal to unity, respectively.} 
38 Use the results of Prob. 37 to show that the solution of the problem 


y"+(16-x’)y=0 yO=1 yO=0 
can be expressed in the form 
yx) = A(x) cos A(x) 
where A(x) is the solution of the problem 


A" + (16 — χά = = A(O) = 1 A’'(0) = 0 


and where 


o [A(t 


Also determine A(x), and hence 6(x) and y(x), for x = 0.0(0.1)1.0 to five places, 
by a numerical method. 


ax) = 4 | a 


Section 6.11 


39 to 41 (a) to (6) Advance the calculations of Probs. 12, 13, and 19 to x = 1.2 
with h = 0.05, given that ¢(1.1) = 1.43392 and #(1.2) = 1.48080 in Prob. 8. 

42 to 44 (a) to (6) Advance the calculations of Probs. 28 to 30 to x = 1.2 with 
h = 0.05. 

45 (a) to (4) Advance the calculation of Prob. 33 to x = 1.2 with hk = 0.05. 
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Section 6.12 


46 


(a) to (d), 47(a) to (6), 48(a) to (4) Obtain approximate solutions of Probs. 5, 28, 
and 33 for x = 0.0(0.1)1.0 by use of (6.12.2) and (6.12.3). 


Section 6.13 


49 


(a) to (e) Obtain approximate values of the solutions of Probs. 5a to d and 8 
for x = 0.0(0.1)1.0 by use of (6.13.15) and (6.13.16), and estimate the errors. 


Section 6.14 


50 


51 


(a) to (e) Obtain approximate values of the solutions of Probs. 5a to d and 8 
for x = 0.0(0.1)1.0 by use of (6.14.5) and (6.14.6), and estimate the errors. 

(a) to (e), 52(a) to (d) Obtain approximate values of the solutions of Probs. 28 
and 33 for x = 0.0(0.1)1.0 by use of (6.14.12) and (6.14.13), and estimate the 
errors. 


Section 6.15 


53 


54 


55 


56 


57 


Use an appropriate step-by-step method to determine approximate five-place 
values of u(x) such that uv” + u = 1, u(0) = 0, uv’) = 0, and of v(x) such that 
v” + v = 0, (0) = 0, v'(0) = 1, for x = 0.0(0.1)1.0. Then use these results to 
determine approximate values of y(x) for x = 0.0(0.1)1.0 satisfying the conditions 
y(x) + p(x) = 1, γ(0) = 0, γ() = 1, and compare the results with exact values. 
Use an appropriate step-by-step method to determine approximate five-place 
values of the solution of the problem 


u“+u=i1 u(0) = 0 u'(0) = « 


for x = 0.0(0.1)1.0, taking successively « = 0 and α = I. Then use linear 
interpolation to estimate the value of « for which u(1) = 1, and investigate the 
correctness of this estimate by making another corresponding step-by-step cal- 
culation (see Prob. 55). 

Prove that the procedure described in connection with Eqs. (6.15.1 1) and (6.15.12) 
would yield an exact result with linear interpolation on « if (6.15.11) were a linear 
equation and if no errors were committed in the determination of solutions 
corresponding to two trial values of «. 

Obtain approximate values of the solution of the problem 


y+y=0 yO=0 γώ ΞΙΊΙ 


for x = 0.0(0.1)1.0 by use of (6.15.18) with A = 0.2. 
Repeat the calculation of Prob. 56 using (6.15.17) with A = 0.2, and compare 
the two approximations. 


58 


59 


60 
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Repeat the calculation of Prob. 56 using (6.15.19) with A = 0.2, together with 
(6.15.20) and (6.15.21). 

Use the method of Prob. 56, together with (6.15.22), to deal with the modification 
of that problem in which the condition y(0) = 0 is replaced by the condition 
yO) = y(0). 

Repeat the calculation of Prob. 59 using the method of Prob. 57, and compare 
the results with those obtained in Prob. 59. 


Section 6.16 


61 


62 


Determine approximate values of the smallest characteristic value of λ for the 
problem 
y+hdy=0 γ(ὺ = yp) = 0 


by use of (6.16.3) and (6.16.4), and compare those approximations with the true 
value z*, and with the corresponding approximations based on the use of (6.15.18) 
with N = 1 and 2. 

Repeat the calculations of Prob. 61 when the condition y(0) = 0 is replaced by 
the condition y’(0) = 0, making use of (6.15.22), in each case, in such a way that 
the order of the procedure is not reduced; compare the results which correspond to 
the use of the approximate condition Ayp = γι — yo = 0. 


63,64 Repeat the calculations of Probs. 61 and 62, making use of (6.16.2) with 


N = 3. 


65 to 68 Deal as in Probs. 61 to 64 with the corresponding modified formulations 


involving the equation y” + Axy = 0. [The true characteristic numbers in Probs. 
65 and 67 are the zeros of the function J, ,3(2A*/?/3), the smallest of which rounds 
to 18.956, whereas those of Probs. 66 and 68 are the zeros of the function 
J_1;3(24*/?/3), the smallest of which rounds to 7.8373. ] 


7 


LEAST-SQUARES POLYNOMIAL 
APPROXIMATION 


7.1 Introduction 


There are two classes of situations in which the process of determining an 
approximation (polynomial or otherwise) to a function by fitting given data 
exactly on a certain set of discrete points often is a particularly inefficient one. 

First, when the function f(x), to be approximated, is specified for all 
values of x in an interval, it is clearly desirable to take many or all of the known 
values into account, rather than to select an arbitrary set, consisting of the least 
possible number of discrete values which leads to a determinate set of conditions. 
This is especially true when f(x) or one of its derivatives of low order possesses 
known finite discontinuities, or “jumps.” 

Second, and on the opposite extreme, when only a discrete set of approx- 
imate values of f(x) is provided, and when the degree of reliability of those 
values is not well established, it is foolish (and, indeed, inherently dangerous) 
to attempt to determine a polynomial of high degree which fits the vagaries of 
such data exactly and hence, in all probability, is represented by a curve which 
oscillates violently about the curve which represents the true function. In 
particular, the use of the result for numerical differentiation would be hard to 
justify. 
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The so-called method of least squares, which is designed for the treatment 
of both these classes of problems, is introduced in the present chapter, and its 
application to the analysis of typical situations is treated. Several of the classical 
sets of orthogonal polynomials, which are particularly useful in these applications 
and which also will be needed in Chap. 8, are introduced, and certain of their 
properties are discussed. 


7.2 The Principle of Least Squares 


In place of determining a polynomial approximation y(x), of degree n, to a 
certain function f(x), by requiring that the values of y(x) on a set of n + 1 
points agree with known exact or approximate values of f(x) at those points, 
as was done in preceding chapters, it is often preferable to require that y(x) 
and f(x) agree as well as possible (in some sense) over a domain D of greater 
extent. This domain may be taken as a continuous interval, when f(x) is 
specified analytically, or as a discrete set, say, of N + 1 points, where N > n. 

When the available data in D are either exact or of equal reliability, 
it is frequently agreed that the “best approximation” over D is that one for 
which the aggregate (sum or integral) of the squared error in D is least. This 
postulate is often known as Legendre’s principle of least squares. More generally, 
if w(x;) is a measure of the dependability of the value assigned to J (x) when 
x = x;, the criterion is modified by requiring that the squared error at x; be 
multiplied by the weight w(x;) before the aggregate is calculated. 

Suppose first that exact values of f(x) are known over a certain domain 
D, which may consist of a discrete set of points x9, x,,..., Xy or of a continuous 
interval [a, b], and that the approximation is to be of the form 


70 x Σ ab) = ye) (7.2.1) 


k=0 
where o(x),..., &,(x) are n + 1 appropriately chosen linearly independent 
functions. In particular, in order to obtain a polynomial approximation of 
degree n, we could take dy = 1, 6, = x,..., 6, = χ', although other choices 
of the coordinate functions, which would also afford a basis for the generation 
of all polynomials of degree n, are often more convenient, as will be seen. 
It is supposed that the specified weighting function w(x) is nonnegative in D, 


w(x) 2 0 (7.2.2) 
If we define the residual R(x) by the equation 


RO) = fl) — ΟΣ αἰφιθὺ =f) — γ (7.2.3) 
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the best approximation (7.2.1), in the least-squares sense, is defined to be that 
for which the a’s are determined so that the aggregate (sum or integral) of 
w(x)R?(x) over D is as small as possible. It is convenient to denote this aggregate 
here by <wR?>. The requirement 


<wR?) = (» l/ _ Σ ἫΝ = min (7.2.4) 


k=0 


then imposes the necessary conditions 


ae (» lV S aubs| ) Sy, GOT cc) (7.2.5) 
Oa, k=0 
or 
(wd,| 7-3, aude) = φῦ — 99> = 0 (1.2) 
or | 


S αἰυφνφι = WbS> =A sn) 72 


and hence leads to n + 1 simultaneous linear equations in the nm + 1 unknown 
parameters do, a,,..., 4, These equations are called the normal equations 
of the process. 

It is useful to notice that these conditions can be expressed also in the form 


<w(x)d,(x)R(x)> = 0 (r= 0,.1,..., ἢ) (7.2.8) 


Hence, since we have also 
(wR?) = (wR: RD = (wR ἔ > abe |) 
k=0 


= <wRf> - 2 a,<wo,R> (7.2.9) 


it follows that, when the coefficients dp, ... , a, Satisfy (7.2.7), the corresponding 
ageregate squared residual reduces to 


A, = (WR) min = ΦΨΕΙ͂Σ = φῦ -- Ὁ)» 
= (wf?) — > adwod> (7.2.10) 


k=90 


The smallness of the quantity A, can be used as a criterion for the efficiency 
of the approximation over D. 

In particular, if the domain D consists only of m + 1 discrete points, 
and if the set of functions S, generated by the coordinate functions ¢o,..-, Pn 
comprises, say, all polynomials of degree not exceeding n, it is possible to 
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reduce R(x) to zero at each point of the domain. Thus here the least-squares 
procedure reduces to the determination of the polynomial y(x) of degree n 
which agrees exactly with f(x) at n + 1 points, and the minimum value of 
<wR’» is zero. If the domain consists of N + 1 points, where Ν > n, or of a 
continuous interval, exact fit over all of D is generally impossible and the 
procedure gives the function of the class considered which affords the best 
approximate fit under the criterion (7.2.4), in which the weighting function 
w(x) must be specified. 

It is seen from (7.2.7) that the coefficients of the unknowns in the left-hand 
members of the normal equations are independent of the function f(x) to be 
approximated, so that they may be precalculated, once the coordinate functions 
and the weighting function have been selected. Also, since (wo;6,> = (wo jPiv> 
the coefficient of a; in the jth equation is equal to that of a ; ἴῃ the ith equation, 
so that the array of the coefficients of the a’s is symmetrical with respect to its 
principal diagonal. This fact appreciably reduces the labor in both the forma- 
tion and the solution of the set of equations (see Sec. 10.4). 

Clearly, these equations are greatly simplified if the coordinate functions 
are chosen, in advance, in such a way that 


Wid) = 0 GAS) (7.2.11) 


A set of ¢’s having this property over D is said to be an orthogonal set, relative 
to the weighting function w(x), over D. For such a set of coordinate functions, 
the corresponding set of normal equations (7.2.7) becomes “uncoupled” and 
takes the form 


a,<wo,> = <wh,f> (τ =0,1,...,n) (7.2.12) 
Since w(x) is nonnegative, the coefficient of a, cannot vanish, except in a very 


special situation where w(x) vanishes at all points of D for which ¢,(x) does not. 
Henceforth we exclude such cases and accordingly obtain the result 


OP 041, 
<w@, > 
Further, reference to (7.2.10) and (7.2.12) shows that the corresponding 


minimum value of <wR*> can be expressed in the more convenient alternative 
form 


a. = 


weegt) (72:13) 


A, = (WR? )min = ΟΣ — Σ azQwo2> (7.2.14) 


in this case. 
In theoretical work it is often convenient to suppose that the φ᾽ 5 have 
also been normalized in such a way that (wd?) = 1, so that (7.2.13) and 
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(7.2.14) are still further simplified. However, this normalization is rarely 
convenient in practice. 

The root-mean-square (RMS) error in an approximation over D, relative 
to w(x), is defined to be 


2 
Eams = (f — Yams = J Sa (7.2.15) 
<w> 


Here, in particular, when w(x) = 1 the quantity <1) represents the length of 
the interval in the continuous case and the number (N + 1) of points in D in 
the discrete case. 

In the discrete case, it frequently happens that the given data are empirical 
and correspond accordingly to an observed function f(x), and that the true 
function f(x) is not known. Here we must replace f(x) and y(x) by f(x) and γα) 
in the preceding developments, and we then are in position to calculate only 
Eams = UF — Yams Over D. The subsequent estimation of the desired quantity 
Cf — Fes is considered in Sec. 7.4. 


7.3 Least-Squares Approximation over Discrete Sets of Points 


Before exploiting the convenience afforded by the use of orthogonal functions, 
we here consider the application of the general least-squares method to the case 
when the domain D comprises a discrete set of points. The case when D is a 
continuous interval is treated in a completely analogous way. 

In accordance with the results of the preceding section, if an approximation 
to the true function f of the form 


fe) ΚῚΣ αἰφιο) 7.3) 


is to hold over a set Sy of N + 1 points Xo, x;,..., Xy, where N 2 n, in the 
sense that the aggregate weighted squared error is to be a minimum, 


Σ γί αὐ - = a,o(x) = min (7.3.2) 


the set of nm + 1 normal equations (7.2.7) becomes 


N N 


ao ΒΩ ν(χρφι(χρφοίρ + α, Σ w(x;).(x)b1(%)) + τ: 


=0 =O 


N 
+ a, δ: W(X) PAX) Pn(%i) 
= 5 ναϑφιαλγαὺ (= OL 0133) 
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These equations can be obtained quite simply by first writing down the 
N + 1 equations which would require that (7.3.1) be an equality at the N + 1 
points x;, 
An o(Xo) + 44$4(Xo) + °° + Ady (Xo) = F(X) 
Ay Pol(X1) + ayPy(%y) + τ" + a,b, (%1) = f(x) (7.3.4) 


be SS δ ουο 


ἀρφο(χν) + αιφι(Χν) ἘΠ: + 4,6,(Xy) = f(xy) 


The rth normal equation is then obtained by multiplying each equation by the 
coefficient of a, in that equation, and by the weight associated with that equation, 
and summing the results. Unless there is a reason for proceeding otherwise, 
the weights are generally taken to be equal and then may be assigned the value 
unity. 

When N = 2, the problem reduces to that of satisfying then + 1 equations 
(7.3.4) in n + 1 unknowns, and the normal equations are equivalent to the 
original ones. 

AS a very simple example, suppose that the problem is that of fitting 
the equation of a straight line as well as possible (in the least-squares sense) 
to the following data: 

x 0 1 2 3 4 


F(x) 1.00 3.85 6.50 9,35 12.05 


In place of writing out the equations (7.3.4), corresponding to the substitution 
of these corresponding values into the equation 
Ag + ax = f(x) (7.3.5) 


we may merely write down the array of the coefficients of QA) and a, and the 
right-hand members (the augmented matrix of the system) in the form 
| 1 0 
1 1 
1 2 6.50 (7.3.6) 
1 3 5 
1 4 12.05 
Under the assumption that all the data are of equal significance, we take 
all weights equal to unity. The first normal equation then corresponds to the 
result of adding the elements of the respective columns of (7.3.6), to give the 
array [5 10 32.75], and the second corresponds to the result of multiplying 
the elements in each row by the element of that row which lies in the second 
column, and adding the results, to give the array [10 30 93.10], so that the 
normal equations are 
δῶρ + 10a, = 32.75 


(7.3.7) 
10a) + 30a, = 93.10 
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yielding the solution a) = 1.03, a, = 2.76, and hence determining the linear 
approximation 


f(x) αὶ v(x) = 1.03 + 2.76x (7.3.8) 


The values obtained from this approximation at the points x = 0, 1, 2, 3, 
and 4 are 1.03, 3.79, 6.55, 9.31, and 12.07, respectively, and the sum of the 
squared errors is found to be 0.0090. Thus the RMS error over these five points 


is V0.0090/5 = 0.042. 

The interpretation of this result must depend upon the context. If the 
given values are considered to be exact values of a zrue function, then the figure 
0.042 represents the RMS departure between the true function f(x) and the 
approximation y(x) over the five points for which information is available. 
In the absence of any further information, this figure would afford the only 
available estimate of the RMS error over the continuous range Ὁ < x Ξ 4. 
On the other hand, if the given ordinates are empirical and hence properly 
correspond to an observed function f(x), the figure 0.042 represents only the 
RMS departure between f(x) and its least-squares approximation y(x) over the 
five relevant points. Unless additional information is supplied or additional 
assumptions are made, no conclusions can be drawn with respect to the RMS 
value of the true error 7 — jy. 

However, if it is postulated that the true function is such that the residuals 
at each of the N + 1 points can be reduced to zero (or, more realistically, can 
be made negligible), but that the impossibility of achieving this end in the case 
at hand is due to the presence of independent random errors in the several 
observed values, then it is possible to obtain a certain amount of additional 
information. It is also frequently desirable to estimate the errors in the calculated 
coefficients. For both these purposes, we examine the general problem in greater 
detail in the following section. 


7.4 Error Estimation 


Suppose that the right-hand members of (7.3.4) are replaced by values of an 
observed function f(x), and that the calculated coefficients then are denoted by 
0: ---, ὦ,» SO that those relations become 


S a.o(x) =i) G=0,1,...,N) (741) 


k=0 


whereas the proper equations are 


f(x) + E@) (7.4.2) 


> ads) = S00) 
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where E(x;) is the error associated with the “‘observed value” f(x;). The normal 
equations (7.3.3), associated with (7.4.1), then can be written in the form 


> Cy — v, (r = 0, 1, ce ey n) (7.4.3) 


k=0 
where 
CK = Ce = 2 ν(χρφ,( αὐ φι( ὺ (7.4.4) 
and 


N 
v, = 2, w(x:)(x;)f (x;) (7.4.5) 


The corresponding approximation £a,¢,(x) will be denoted by j(x), and the 
residual f(x;) — y(x;) by R(x). 
If we denote by C,, the cofactor of c,, in the coefficient matrix of (7.4.3), 


Coo Cor “"* Con 
C Cc eee Cc 

c=]o° © (7.4.6) 
Cro Cnt to Cnn 


and define the reduced cofactor C,, = C.,/D, where D is the determinant of 
C, noticing that C,, = C,, because of the symmetry of the array, the solution 
of the set (7.4.3) can be expressed in the form (see Sec. 10.2) 


a, ΞΞ > C40; (r = 0, 1, ce ey n) (7.4.7) 
k=0 


In order to express the a’s directly in terms of the given ordinates, we introduce 
(7.4.5) into (7.4.7), to obtain 


n N N 5 
a, = > Ca Σ weeded) = > |S Cadstxd] worse 


Thus, if we introduce the abbreviation 


(x) = Dx: rk P(X) (7.4.8) 


this relation takes the form 
N 
= 2 να)φ, 7 (1.49) 


Accordingly, if a, denotes the corresponding coefficient calculated from 
the true ordinates, there follows also 


α, -- ἃ, = Σ wxO(xJE) — (7.4.10) 
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This relation gives the difference between the rth coefficient actually obtained 
and that which would have been obtained if no observational errors (or roundoff 
errors) were present. If the hypothetical assumption is made that f(x) is actually 
a member of the set of all functions expressible as linear combinations of @o, ..., 
¢,, then f(x) truly is specified by the constants dy,..., @,, and we then have 
70) — W(x) = XG, — 4,)d,(). 

Generally, only bounds on the observational errors E(x;), or mean values 
of their squares over a set of observations, are available in practice. In the 
latter case, the weights w(x,),..., W(Xy) desirably are so chosen that w(x;) is 
inversely proportional to the mean value of E*(x;), so that the mean values of 
W(Xp)E7(Xo), --- 5 W(Xy)E?(xy) over the set of observations are (approximately) 
equal. 

Assuming that this has been done, we may first obtain from (7.4.10) the 
relation 


(a, — G,)? = [w(Xo)®?(xo) |[wxo)E7(%o)] + ° °° 
- [w(xy) Or (xy) [wy E *(xw)] ape (7.4.11) 


where the omitted terms at the end involve products of the form E(x,)E(%;) 
where i τῇ j. If both sides of this equation are averaged over the available 
set of observations, if the mean value of the product E(x;)E(x,) is assumed to 
be zero when i τ j, and if w(xo),..., w(Xy) are assigned values such that 


W(X )LE7(Xo) Im = ““" = WOy)LE*y) Im = 97 — (7.4.12) 


where o” is a conveniently chosen constant, there then follows 


N 
(a, — ,)2 = 07 > w(x)@7(x,) (7.4.13) 
i=0 
Here the subscript m indicates the mean over the set of observations. 

The quantity (a, — G,)> may be referred to as the estimated variance of a, 
or of its error δα, = a, — G,, and the square root of that quantity as the 
corresponding estimated standard deviation (see Sec. 1.7). 

This result can be put into a more convenient form. For this purpose, 
we notice first that if f(x,) is identified with $,(x,) in (7.4.1), whereO Ss Sn, 
there follows a, = 6,,, where 6,; = 1 when r = j and 0 otherwise. Thus we 
deduce from (7.4.9) that 


N 


> WX)P(x)b(%;) = ὃ. (7.4.14) 


i=0 


so that the two sets of functions {¢,(x)} and {®,(x)} are biorthogonal (actually 
“biorthonormal”) on Sy relative to w(x). Reference to (7.4.8) then gives 
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\ 


Σ Οὐ Φ ΟΡ] = > were.) |S Cadscxd| 


w(x,)®, x)buCx | Cu 


i=0 
= > reCrk came ον (7.4.15) 


Thus (7.4.13) can be written in the form 


G24) = a σξ (7416) 


where C,, is the cofactor of c,, in the matrix C and D is the determinant of that 
matrix. 

It may happen that the individual values of [E?(x,)],, fori = 0,1,...,.N 
are not known but that their ratios can be estimated, that is, that the relative 
dependability of the measurements at x9, x,,..., Xy is known. If w(x), ..., 
w(xy) again are assumed to be so chosen that (7.4.12) holds, where o? now is an 
unknown constant, and if it is now assumed (hypothetically) that the true 
function can be fitted exactly at the N + 1 points involved, it is possible to 
obtain an estimate of o” in terms of the calculable residuals R(xo), ..., R(xy), 
such that 


R(x) = F(x) — Wx) = Fx) — Σ ab(x) (ἰ--0.....Ν) (7417) 


For this purpose, we notice first that, since E(x,;) = f(x;) — f(x,, we 
have also 


F(x) — yx) = EQ) + RO) (7.4.18) 
From (7.4.2) and (7.4.17), there follows 
Bx) + RO) = > (αι -- ἄρφιαὺ — (7.4.19) 
or, after using (7.4.10), 


Bx) + RO) = Σ dle) Σ wex,)O(xJEC) (7420) 


If we multiply both members of (7.4.19) by w(x,)R(x;) and sum over i, 
making use of the fact that Lw(x;)¢,(x,))R(x;) = 0, in accordance with (7.2.8), 
there follows 


Σ νου ποὺ = — > νοῦ (420 
ἶξξο ἐξὸ 
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Also, by multiplying both members of (7.4.20) by w(x) E(x;), summing over /, 
and making use of (7.4.21), there follows 


N N 
Px w(X,)E *(x) - δ, νι 


= > > Σ wedded )wECIEC) (0.422) 


ΞΟ y=0 
If we now average the equal members of (7.4.22) over a set of observations, 
and again assume that [E(x,)E(x;)]n = 0 when i #4 j and [w(x)E 200) |e = 
o* fori = 0,..., N, only the terms for which v = i will remain in the right-hand 
member, and there follows 


(N + 1)o? -- 2 w(x,)[R7(x;) Jn 
= Σ [Σ wedded] σ᾽ =n + De? (1423) 


since the sum in brackets in the first right-hand member is unity by (7.4.14). 
Thus we deduce that 


1 « Ξ 
σ Ξε α -- 3 » ΓΕ], (7.4.24) 
We may notice also that, from (7.4.18) and (7.4.21), there follows 
N N N 
Σ γί [(χὺ -- Wx] = = w(x)E*(x;) — 2 w(x;)R7(x;) 


so that the mean of this weighted sum over the available set of observations is 
given by the right-hand member of (7.4.23). Thus (7.4.24) permits us to deduce 
also that 


N N 
Σ νουΓλοὺ — FO) = 22S wade (7.425) 


When only the residuals which correspond to a single set of given ordinates 
are available, the best estimate of the mean value of R?(x,) consists of the single 
calculated value, so that [R?(x,)], must be replaced by R*(x;) in (7.4.24) 
and (7.4.25). 

It is convenient to summarize the preceding results in the case when 
w(x) = 1, which is of most common occurrence. Here we may write ae 
for the mean value of the squares of the N + 1 residuals R(x;) = f(x;) — Y@)), 


so that 
: L[R(x,)]? 
é ee ele 7.4.26 
RMS | Nal ( ) 
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measures the RMS departure between the observed function and its calculated 
approximation over the N + 1 points involved. Since here o? = [Ε2(χ})}],» 
the relation (7.4.24) then affords the estimate 


ἔπ» fae DI? ~ [eet bas (1427) 
—n Ν -- ἡ 


of the RMS departure between the true function and the observed function 
over those points and (7.4.25) yields the estimate 


(n + 1)((N + 1). 
—_— = Nom of 
“- = Yams Ν - 


(f — Yams © RMS (7.4.28) 


for the RMS departure over that set of points between the true function and the 
least-squares approximation to the observed function. 
Also, the combination of (7.4.16) and (7.4.25) gives 


yD ; ΙΌ, /Z[LR(x,)]’ 
(δα,)κμς = ν(α, = an ~ | D Εμμς © D J =[Ro! (7.4.29) 


as an estimate of the RMS error in the rth calculated coefficient G,. In each 
case, N + 1 denotes the number of points employed and n + 1 the number of 
independent coordinate functions. The estimates (7.4.27) to (7.4.29) are es- 
sentially based on the assumption that sufficiently many @’s are used to ensure 
that the true function can be expressed in the form La,¢,(x), apart from neg- 
ligible deviations, for some choice of the a’s, and are to be used in the more 
general case with a corresponding degree of caution. These estimates are 
properly meaningless when N = nsince then all the data are needed to determine 
the approximation and no data remain for the estimation of the error. 

If the given data in the preceding example are empirical, the figure égus = 
0.042 thus represents the RMS departure between the observed function and 
the “smoothed function” over the five relevant points. Since also Cy, = 30, 
Ci; = 5, and D = 50 in (7.3.7), (7.4.29) gives 


(ay)ams © 0.8ERus (0a;)ams ~ 0.3Epms 


where Egus is the RMS value of the observational errors, under the assumption 
that the true function is linear. Also, if use is made of (7.4.27), with n = 1 
and N = 4, we obtain the estimate 


Eams © V& (0.042) = 0.055 


under the same assumption. Accordingly, the RMS errors in a) and a, then 
may be estimated as about 0.044 and 0.016, respectively, and also (7.4. 25) 


then affords the estimate “1 4° épms = /2 2Erms = 0.076 for the RMS value of 
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the departure between the unknown true function f(x) and the smoothed 
function y(x) over the five data points. On the other hand, if an independent 
estimate of the RMS value of the observational errors were available, a com- 
parison of that estimate with the estimate of ἔκμς obtained here would serve to 
indicate the validity of the assumption that the true function is indeed linear. 

The solution of the normal equations and the evaluation of the relevant 
determinant and cofactors can be conveniently effected by the use of procedures 
described in Secs. 10.4 to 10.6. It should be noted, however, that unfortunately 
the set of normal equations often tends to be ill-conditioned in the sense that 
small errors in the coefficients or in the numerical solution process may lead to 
large errors in the solution of this set. This situation clearly can be avoided 
by the use of orthogonal coordinate functions (see Secs. 7.12 and 7.16), for 
which the normal equations are uncoupled, as was seen in Sec. 7.2. 

The same methods are used more generally in dealing with sets of linear 
equations, in which there are more equations than unknowns, whether or not 
they arise from a problem (7.3.1) in “curve fitting.” In general, the original set is 
inconsistent, and does not possess a solution. The normal equations then 
correspond to the result of minimizing the sum of the (weighted or unweighted) 
squared deviations between the two members of those equations. If the 
squared deviation associated with the kth equation is to be weighted by w,, 
the same end result can be obtained alternatively by multiplying both sides 


of that equation by J w, and using a unit weight in forming the normal equations. 
In this connection, it should be noticed, for example, that the equations x = 2.3 
and 5x = 11.5 are not equivalent, if the right-hand members are known only 
to be correct to the places given, since the first assertion is equivalent to the 
condition 2.25 < x < 2.35 and the second to the condition 2.29 < x « 2.31. 

In this more general case, the coefficients of the left-hand members of 
the original equations, as well as the right-hand members, may be subject to 
error. Here, if the normal equations are again represented by (7.4.1), and if 
Ex ms represents the RMS error of each of the right-hand members of the original 
equations, whereas ἥμμς denotes the RMS error of each coefficient in the original 
set, the estimate (7.4.29) is to be replaced by 


C,, 
(0a,)ams * D ν εἶμς + (αὐ + a? + +++ + αξ)ηὴμς (7.4.30) 


when w(x) = 1, under the assumption that all errors are small, random, and 
independent, and that the RMS errors in the coefficients of the original equations 
are all equal. 
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7.5 Orthogonal Polynomials 


We consider next the case when a least-squares approximation is to be effected 
over the interval [a, b], and we attempt first to construct a set of polynomials 
Po(x), Py (x), ..., 6,(x), ..., Such that each member is orthogonal to all others 
in the set over [a, δ] relative to a specified weighting function w(x) which is 
nonnegative in that interval.{ It is convenient to ask that ¢,(x) be a polynomial 
of degree r. The problem then will be solved, in particular, if we obtain a 
polynomial ¢,(x) which is orthogonal over [a, δ] to all polynomials of degree 
inferior to r. 
Thus we require a polynomial ¢,(x), of degree r, such that 


Ϊ ᾿γα)φ,(αλα,. .Ο) dx =0 (75) 


where w is specified and where q,_, is an arbitrary polynomial of degree r -- 1 
or less. In order to express this requirement in a more useful form, we integrate 
by parts r times, making use of the fact that 4.7), = 0. For this purpose, we 
first introduce the notation 


d'U,(x) 


7.5.2 
ae (7.5.2) 


w(x),(x) = 
so that (7.5.1) becomes 


I 
Θ 


[υϑῶφ ὦ ἀν 
or, after r integrations by parts, 
[ρον — US Pq, + UL Pgh, — + 
+ (-1)'U,g85?7 =0 (7.5.3) 
The requirement that the function ¢,(x) defined by (7.5.2) 
1 aU sx) 
w(x) dx" 


$x) = (7.5.4) 


be a polynomial of degree r implies that U,(x) must satisfy the differential 


equation 
r+1 r 
ες πε ΘΕ gy 55) 
dx’ w(x) dx" 


in [a, δ], whereas the requirement that (7.5.3) be satisfied for any values of 


T For alternative derivations, see Sec. 7.10 and Prob. 40. 
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g,-1(@), 9,-1(b), 9;-1(@), 9;-1(b), and so forth, is met by satisfaction of the 2r 
boundary conditions 


U,(a) = Ua) = υϊ(α) = °° = US Ma) = 0 (7.5.6) 

U,(b) = Ui(b) = υἱ() = +7 = Ub) = 0 (7.5.7) 

Thus if, for each integer r, a solution of (7.5.5) which satisfies (7.5.6) 

and (7.5.7) can be obtained, the rth member of the required set of functions is 
given by (7.5.4). From the homogeneity of these conditions, it follows that 
each such solution will contain an arbitrary multiplicative constant. It is known 
(see Szego [1967]) that the problem thus formulated does indeed possess a 
(nontrivial) solution, even when a and/or 5 is infinite, under the assumptions that 
w(x) = 0 in the interval and that [ x*w(x) dx exists for all nonnegative integral 


values of k. 
In accordance with the results of Sec. 7.2, the coefficients in the expression 


yo) = > αφοὺ 0.58) 


are then determined by the requirement 


[ w(x) f(x) -- y(x)]? dx = min (7.5.9) 
in the form 
b b 
fabs Bak oa 
where 
ἢ; ΞΞ [ wo? dx (17.5.11) 


Although the numerator in (7.5.10) depends upon f, the denominator γ, iS 
independent of f and can be determined once and for all. 

The calculation of y, is facilitated by the following considerations. If 
we write 


OA(x) = 4,0 + Anx Ἐ τ + A,X" (7.5.12) 


so that A,, is the coefficient of x* in $,(x) and A, = A,, is its leading coefficient, 
there follows 


ἘΞ [ ” να) φ,Οὐφ, (ἡ dx 


7 | " w(x) [Aro Ὁ Aux Ἐ τ: + 4, dx 


a 
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and hence if we recall the relations 
b 
Ϊ γίχ)φ,(α)χὶ ἀχ ξϑἠ (ἰ -Ξ 0,1,...,γ πὶ ἢ) (7.5.13) 


which are equivalent to (7.5.1), we may deduce that 


b b 
7, = A, Ϊ χ'ν(χ)φ,(χΧ) dx = A, Ϊ χ  (Χ) dx 


By integrating by parts r times and making use of (7.5.6) and (7.5.7), this relation 
takes the convenient form 


b b 
γ, ΞΞ Ϊ γ(χ)φΖ7(χ) dx = (—1)r! A, | U(x) dx (7.5.14) 


where A, is the coefficient of x’ in ¢,(x). 

It is useful to notice that the problem specified by (7.5.8) and (7.5.9) 
can be generalized in the following way. It may happen that f(x) clearly 
cannot be satisfactorily approximated over [a, b] by a polynomial of low 
degree, but that a certain function v(x) is known such that the ratio f(x)/v(x) 
can be so approximated. Thus, if we determine the coefficients of the polynomial 


yx) = Σ bd) (1.5.5) 


in such a way that 


[ ΕΣ “ἜΣ Σ bate) | dx = min (7.5.16) 
F r=0 


the orthogonality of the @’s relative to w leads to the result 
1 (ὅν 
b, =— | —fd,dx (7.5.17) 
Yr Ja U 


It is seen that (7.5.16) is equivalent to the result of minimizing the squared 
error (7 — vy)” with the weighting function w/v?. The choice w(x) = v(x) is a 
frequently useful one. Several examples of such approximations are considered 
in the following sections. 


7.6 Legendre Approximation 


For least-squares approximation over an interval of finite length, it is convenient 
to suppose that a linear change in variables has transformed that interval into 
the interval [—1, 1]. We consider here the case when the weighting function 
is unity: 

w(x) = 1 (7.6.1) 
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The differential equation (7.5.5) then becomes 


d7**1y, 
and the boundary conditions (7.5.6) and (7.5.7) take the form 
U(+1) = U(t1) τ τ US M(4N =0 (7.6.3) 


from which there follows (analytically or by inspection) 
U, = C(x? — ΤΥ̓ (7.6.4) 


where C. is an arbitrary constant. Hence, from (7.5.4), it follows that the rth 
relevant orthogonal polynomial is of the form 


$(x) = C, 


Q (x? — 1) (7.6.5) 
| dx" 

With C. = 1/(2’r!), the polynomial so obtained is known as the rth 
Legendre polynomial and is usually denoted by P,(x). The relation 


1d’, 
ὁ (xt 1" = (7.6.6 
hia 9 


P(x) = 


is often called the Rodrigues formula for P,(x). From the preceding derivation, 
it follows that 


[ PAx)P,(x) dx = 0 (r # 5) (7.6.7) 


where r and s are nonnegative integers. The value assigned to C, is such that 
P.(1) = 1, and it is true also that |P,(x)| < 1 when |x| Ξ 1. 
The first six of these polynomials may be obtained in the forms 


P(x) = 1 P,(x) = x 

P(x) = 18χ -- 1) P,(x) = 4(5x* — 3x) (7.6.8) 
P,(x) = 4(35x* — 30x*7 + 3) = P(x) = 1(63x° — 70x* + 15x) 

and additional ones can be determined from the recurrence formulat 


Px) = tt spe) -—"_ Py) (169) 
r 


+1 r+1 


It may be noted that the polynomials of even and odd degrees are even and 
odd functions of x, respectively. 


+ For the derivation of this formula and of other similar formulas to be listed in 
following sections, see Sec. 7.10 and Szego [1967]. 


LEAST-SQUARES POLYNOMIAL APPROXIMATION 331 


In order to evaluate the factor (7.5.14), we notice first that (7.6.6) gives 


x2" ΧΡ 2 4... (2r)! 
PAR) = a SoM τότε pone) τ STE 
so that 
_ (2r)! 
7 2"(r1)2 
Hence (7.5.14) gives 
a Ὁ δ (2r)! 1 2.» 
γ, ΞΞ [. P*(x) dx = a mI (1 — x*y dx 


! 2r+17,.4)2 
7 Cale 2 er) eae ee (7.6.10) 
277(r!)? ὧν +1)! 2rt+1 
Thus the nth-degree least-squares polynomial approximation to /(x) 
over [—1, 1] relevant to a constant weighting function is defined by 


yo) = Sa Pi) (-1SxS1) (76.11) 
where τ 


ἧς - : ¥ ΤΩΡ, ἀχ (7.6.12) 


It has the property that, for all polynomials y,(x) of degree n or less, the in- 
tegrated squared error 


[τὼ = yGoP ax 


_ is least when y,(x) is identified with the polynomial defined by (7.6.11). By 
virtue of (7.2.14), the minimum value is given by 


1 ce a2 
A 2,ιχ-- δ᾽.’ (7.6.13 
[Pe da ass 


In accordance with (7.5.15) to (7.5.17), it follows also that the least-squares 
approximation to f(x) of the form v(x)y(x), with y(x) a polynomial 


W(x) = Σ b,P(x) (-1<xS1) (7614) 


and with the weighting function 1/[v(x)]?, where v(x) is a specified function, is 
that for which 


b= tL $Opiyax (7615) 


2 J-1 %) 
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7.7 Laguerre Approximation 


For least-squares polynomial approximation over a semi-infinite interval, it is 
convenient to first transform that interval into the interval [0, 00) by a translation 
of the origin. A frequently used approximation makes use of a weighting function 
of the form 

w(x) =e ™ (7.7.1) 


where « is a positive constant, taken to be sufficiently large to ensure the existence 
of the integral of the squared error over the semi-infinite interval (when this is 
possible). 

From the results of Sec. 7.5, the relevant orthogonal polynomials are such 


that 
OO) Sere es τυ πὴ 
dx" 
where 
ἐγ a’u 
—— | e* -—| = 0 7.7.3 
dx’! [ dx’ ( ) 
and 
U0) = UXO) ΞΞ  --: = US-10) = 0 (7.7.4) 
Uo) = Uo) =-:: = Uso) = 0 (7.7.5) 


The general solution of (7.7.3) is readily found to be 
U. =e (co Hex te tex) τὸ ἀρ tdaxte tax 


where the c’s and d’s are arbitrary constants. The conditions (7.7.5) require 
that all d’s vanish, and (7.7.4) gives co = ὦ) = *** = ¢,-1 = 0, So that there 
follows 
U(x) = C,x"’e ™ (7.7.6) 

and hence 

bfx) = Ce“ (xe) (17 

dx’ 

With C, = 1 and « = 1, this polynomial is called the rth Laguerre polynomial 
and is usually denoted by L,(x): 


Loy ξεν We) 778) 
dx’ 


It follows that, again taking C, = 1, the polynomial (7.7.7) can be expressed 
in the form 


o,(x) = L,(ax) (7.7.9) 
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and that we have the orthogonality property 
Ϊ e “Li(ax)L (αχ) dx = 0 (r#s) (7.7.10) 
0 


when r and s are nonnegative integers. 
The first six of the Laguerre polynomials can be obtained in the form 
L,(x) = 1 Lix=1-x L(@=2-4%4+> 
L3(x) = 6 — 18x + 9x7 - x* 1,(Χ) = 24 — 96x + 72x? — 16x? 4+ x4 
L(x) = 120 — 600x + 600x? — 200x* + 25x* — x° (7.7.11) 
and additional ones can be determined from the recurrence formula 
1,61) = (1 + 2r — X)L,(X) — ΚΞ, (7.1.12) 


The value assigned to C, is such that the coefficient of x" in L,(ax) is (—a)’. 
Hence, from (7.5.14), there follows 


co 


γ, = | e~*T2(ax) dx = ὯΙ 
0 


0 


χε dx -- rN? (7.7.13) 
α 


Thus the nth-degree least-squares polynomial approximation to f(x) 
over [0, 00), relevant to the weighting function w(x) = e~*, is defined by 


ioe > a,L{ax) (<x<0) (7.7.14) 


where 


a, = —~| e-*f(x)L,(ax) dx (7.7.15) 
(ἢ) Jo 
It has the property that, for all polynomials y,(x) of degree n or less, the integrated 
weighted squared error 


Ϊ eFC) 2 Vdd? dx 


is least when y,(x) is identified with the right-hand member of (7.7.14). In order 
for this integral to exist, it is generally necessary that | f(x)| grow less rapidly 
than e*”? as x > o. 

Another type of approximation employing Laguerre polynomials is 


obtained if we require the coefficients in the relation 


y(x) = > b,L,(ax) (OS x<o) = (7.7.16) 
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such that 
[ἐνῶ - yop ax 


= |, « τὴ Ke —e™ 3 μων} dx Ξ ἢ (7.7.17) 


This is a special case of the problem specified by (7.5.15) to (7.5.17) in which 
v(x) = w(x), and the coefficients are thus obtained in the form 


b, - aI ΤΟΙ, (ax) dx (71.1.18) 


In order that the integrals in (7.7.17) exist, it is generally necessary that f(x) 
tend to zero more rapidly than e~“/? as x > ©. 


7.8 Hermite Approximation 


Over the doubly infinite interval (— 00 < x < οὐ), a frequently used weighting 
function is of the form 


w(x) =e?” (7.8.1) 


In this case the relevant orthogonal polynomials are defined by 


2,2 Ὁ 
A(x) = ε΄" ——* 7.8.2 
${x) = e ae (7.8.2) 
where U, satisfies the equation 
d™*! 2,2 d°U 
——. | e* ——_|=0 7.8.3 
αἀχ | dx" | ( ) 


and where U, and its first r — 1 derivatives are to tend to zero as X > +0. 
Since the function 


U(x) = Ce** (7.8.4) 


has the property that its rth derivative is the product of itself and a polynomial 
of degree r, it satisfies these conditions and there follows 


bx) = Ge" le") 0185) 

The Hermite polynomial of degree r is usually defined by taking C, = 

(—1) and, in addition, either αὖ = 1 or a? = 4in (7.8.5). Both definitions are 
used in the literature. We adopt the former one and write 


H(x) = (-vte?“(e™) (78.6) 
dx’ 
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so that, with the choice C, = (—a)~’, (7.8.5) becomest 


$2) = Hex) = (—a)"e"* (ee) (181) 
Thus Hermite polynomials possess the orthogonality property 
[᾿ ε΄, “ΣΉ, (ax)H (ax) dx = 0 (r # 5) (7.8.8) 
when r and s are nonnegative integers. The first six of the polynomials defined 
by (7.8.6) are obtained in the form 


H)(x) = 1 H,(x) = 2x 
H,(x) = 4x? — 2 H;(x) = χ᾽ — 12x (7.8.9) 
H,(x) = 16x* — 48x? +12 A(x) = 32χ — 160x3 + 120x 
and additional ones can be determined from the recurrence formula 
Hi, 44(x) = 2xH,(x) — 2rH,_,(x) (7.8.10) 
With A, = (2a)’ and U, = (—1/a)"e~*’*’, Eq. (7.5.14) gives 


re) 00 ry 
» = | eux) dx = zn | eR dx == Jn (78.11) 


σοῦ 


Thus the nth-degree least-squares polynomial approximation to I(x) 
over (-- οὐ, +00), relevant to the weighting function w(x) = e~**’, is defined 
by 


W(x) = > aHfax) (-0o <x <0) (78.12) 
r=0 


where 


a, = Bie Ι΄ e~? HH (ax) f(x) dx (7.8.13) 
2'r! Ὁ π 


It has the property that, for all polynomials y,(x) of degree n or less, the inte- 
grated squared error 


— 


[7 **L1@) - κοῦ ax 


is least when y,(x) is identified with the right-hand member of (7.8.12). It 
must be assumed that the behavior of f(x) is such that this integral exists. 


} With the definition H,(x) = (—1)'e**/? d'(e~*”!?)/dx" and the corresponding choice 
C, = (-- 172. Ἴ3α π΄, there would follow ¢,(x) = H(V2 ax). 


336 INTRODUCTION TO NUMERICAL ANALYSIS 


It should be noticed that, since the weighting function e~**” becomes 
small very rapidly as x increases in magnitude, the least-squares criterion here 
requires that the magnitude of the deviation f(x) — y(x) be small when x is 
small, but tolerates large values of that deviation when x is large in magnitude. 
A similar remark applies somewhat less strongly to the approximation of the 
preceding section. Thus, such approximations should not be used unless this 
situation is an acceptable one. 

Another type of approximation, of particular importance in the theory 
of statistics, is obtained if we require the coefficients in the relation 


γα) = > bHfex) (-o<x<o) = (7.8.14) 
r=0 
such that 


[΄. εὐ F(x) -- εὐ γ0)}]Σ dx 
[ oe | F(x) ee iad > byH(ox)| dx = min (7.8.15) 


The conditions governing the b’s are obtained directly, or by reference to 
(7.5.15) to (7.5.17), with v(x) = w(x), in the form 


᾿ ΤΑῚ (ax) dx (17.8.16) 


_ ἃ 

"Orr! / π 

assuming that the behavior of f(x), for large values of |x|, is such that the 

integrals involved exist. In particular, the approximation (7.8.14) is often used in 

situations when f(x) vanishes for all values of |x| which exceed a certain value. 
If the ith moment of f(x) is defined as 


m,; = [ χ(χ) dx (77.8.17) 


and use is made of the explicit forms of (7.8.9), we find that the leading coef- 
ficients in (7.8.14) are expressible in the forms 


2 

ἐξα τὸ =~ ~_m, δ, --Ξ-- (αι; -- mo) (7.8.18) 
J π J π 4/7 

and that the remaining b’s can be similarly expressed in terms of the moments. 

7.9 Chebyshev Approximation*+ 


In cases when errors near the ends of an interval [a, b] are of particular im- 
portance, a weighting function which is of the form 1 i (x — a\(b — x) 15 


+ The term ‘Chebyshev approximation’’ also is used frequently in the literature to 
refer to u “minimax approximation” (see Sec. 9.9). 
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often appropriate. It is supposed again that a linear change in variables has 
transformed the given interval into the interval [—1, 1], so that the weighting 


function becomes 
1 


V1 — x? 
In order to obtain the relevant orthogonal polynomials in this case, it is 


convenient to start with the basic condition (7.5.1) rather than with its con- 
sequences. Thus we require a polynomial —_ of degree r in x, such that 


w(x) = (7.9.1) 


| £1) dx=0 (7.9.2) 
1 V1 — x? 


where q,_,(x) is an arbitrary polynomial of pie r — 1 or less in x. If we 
introduce the change in variables 


x = cos 8 (7.9.3) 
this requirement becomes 


Ϊ " ,(cos 0)g,_,(cos θ) d0=0 (7.9.4) 


Now, since cos k@ is expressible as a polynomial of degree k in cos 0 
and since, conversely, any polynomial of degree k in cos θ can be expressed 
as a linear combination of 1, cos 6, cos 20,..., cos kO, it follows that (7.9.4) 
will be satisfied if and only if 


| o,(cos 6) cos Κθ 4θ = 0 (kK =0,1,...,r-—1 (7.9.5) 
0 
It 1s easily verified that the function 


$,(cos 0) = C, cos γθ (7.9.6) 


has this property. Hence, returning to the variable x by using (7.9. 3), we verify 
that the functions 


$,(x) = C, cos (r cos~! x) (7.9.7) 


are the required orthogonal polynomials. With C, = 1, these polynomials are 
known as Chebyshev polynomials,} often denoted by T(x), so that we may write 


$,(x) = T,(x) = cos (r cos™? x) (7.9.8) 


Thus, these polynomials possess the orthogonality property 


| ᾿ ΤΟΎ ERIE gx = =-0 (r¥s) (7.99) 
-1 V1 — x? 


+ The name of Chebyshev (or Tschebycheff) is associated with various sets of poly- 
nomials in the literature (see also Secs. 7.13 and 8. 14). 
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when r and s are nonnegative integers. The first six are obtained in the form 
T)(x) = 1 T(x) = x 
T(x) = 2x? — 1 T;(x) = 4x° -- 3x (7.9.10) 
T,(x) = 8x* — 8x7 +1 = T(x) = 16x° — 20x* + 5x 
and additional ones may be determined from the recurrence formula 
T+) = 2x7.) - Τὺ (2) (19.111) 


In order to evaluate the factor 


© TFC) gy (7.9.12 
γ, = | x (7.9.12) 


we again write x = cos θ and T,(x) = cos r6, so that there follows directly 
(r = 0) 
(7.9.13) 


wlia a 


oe ie Ξ 
p= Ι! οοβ΄ ré dé (r #0) 


Thus the nth-degree least-squares polynomial approximation to f(x) over 
[—1, 1], relevant to the weighting function w(x) = 1 if 1 — x’, is defined by 


ya) = S aT) (-1SxS0 (79.14) 


where 


_1f(* 70) F(x)T AX) 
ay = Α Ι. aa =a dx a, = | ἘΞΞ Fi a dx (r#9) (7.9.15) 


It has the property that, for all polynomials of degree n or less, the integrated 
weighted squared error 
1 
7153 


is least when y,(x) is identified with the right-hand member of (7.9.14). 
On the other hand, if we wish to approximate f(x) in the least-squares 


sense by the product of 1 I 1 — x? and a polynomial, over (—1, 1), with the 
weighting function /1 — x”, we are to determine the coefficients in the relation 


-} ax) ]? dx 


γα) = Σ b,T(x) (-l<x<i) (7.9.16) 
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such that 


[_ t= 8 [10 - POLY ax 


1—x 


= [ V1 -- x? fo — πΞΣ bts) | dx = min 
-1 — Χ' χεϑὸ 


The conditions determining the b’s are obtained in the form 


1 1 n 
Τ, ap pewi cen dx = 0 
[. 09] 700) a> | x 


and the use of (7.9.9) and (7.9.13), or of (7.5.15) to (7.5.17), yields the 
determination 


ae | ᾿γῶ Δ 5.-.2 | " f(T) dx ὁ: δὲ (7.9.17) 
Tw Jy mu J—1 


A great variety of other types of least-squares polynomial approximations 
can be formulated in terms of other weighting functions. In particular, for the 
weighting function 


wx) =U - τὺ = @>-1,8 >—-1) (7.9.18) 


over [—1, 1], which reduces to the Legendre case when « = β = 0 and to 
the Chebyshev case when a = β = —4, the rth orthogonal polynomial is 
readily found to be of the form 


@(x) = C1 — χ), Ὁ + xy -- [ad -- x)**71 + xt] (7.99.19) 
x 


which may be identified with the rth Jacobi polynomial when C, is suitably 
specified (see Sec. 8.9). 

In particular, the factor C, for T,(x) is given by (—2)'r '/(2r)!, so that 
(7.9.8) can also be written in the form 


—2/r! 
T(x) = 27"! 
(2r)! 
Analogous polynomials S,(x), which are associated with the weighting function 
w(x) = (1 — x?)!/? and which can be expressed in the form 


S(x) = sin KG + 1) cos” x] 
sin (cos~ x) 
= (FO + Ye yp 
(2r + 1)! 


(= x22 & d’ (1 — x2" /2) (79,20) 
dx" 


v2 δ᾽ 1. yay (79,21) 
dx’ 
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are sometimes called Chebyshev polynomials of the second kind (considered in 
Prob. 33).+ 
For the weighting function 


w(x) = xPe™ = (β» -ἴ,α5 0)ὺ (7.9.22) 
over [0, 00), there follows 


φ,(Χ) = Cx Fe™ = (xP πο 3) (7.9.23) 
x 


and the resultant polynomials are frequently called Sonine polynomials, or 
generalized Laguerre polynomials (for additional information see Szego [1967]). 


7.10 Properties of Orthogonal Polynomials. 
Recursive Computation 


In order to provide a basis for computational techniques associated with the 
preceding approximations, as well as for later developments, we next exhibit 
some useful properties of orthogonal polynomials. 

First, principally for later reference (Sec. 8.4), it is shown that if ,(x) 
is the rth member of the orthogonal set relative to the weighting function 
w(x) over an interval [a, b], then if w(x) does not change sign in [a, Ὁ], the 
polynomial $,(x) possesses r distinct real zeros, all of which lie in the interval 
(a, b). In order to establish this fact, we notice first that, since {2 γῴ,φο aX = 
Ag 1} wo, dx = 0 when r = 1, and w(x) is of constant sign, $,(x) must change 
sign at least once in (a, δὴ) when r = 1. Now let those real zeros of $,(x) which 
are of odd multiplicity, and which lie in (@, δ), be denoted by cy, 62» - - - » Cm 
and assume that m < r. Then the product 


(x -- οὐαὶ -- ὁ)" (α — Cn)P CX) 


does not change sign in [a, Ὁ]. But, since m « r, the coefficient of φ,(Χ) is a 
polynomial of degree less than r, and hence, by (7.4.1), we must have 


[/ weatice = οὐ = ex) + τ φρφιοὺ ἀκ = 0 


However, since w(x) does not change sign in [a, b], the integrand therefore 
has the same property, and a contradiction follows. Hence there must follow 
m = r, and since the total multiplicity of all zeros is equal to r, all roots must 
be real and distinct and must lie in (a, δ), as was to be shown. 

Next it is shown that each set of orthogonal polynomials satisfies a simple 


+ The notation U,(x) is often used in place of S,(x). 
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three-term recurrence formula. In particular, a basis is provided for the deriva- 

tion of the relations stated without proof in preceding sections (see Prob. 36). 

We notice first that 

A, + 
A, 


Px+ s(x) -- , x@,(x) 


is a polynomial of maximum degree k. Hence, if we write 


i = “4 (7.10.1) 


k 


it follows that @,. (x) — a,x@,(x) can be expressed as a linear combination of 
Po(x), P(x), “4459 φ, ΑἹ, in the form 


φι..(Χ) — a,xG,(x) = by Gi (x) + chy «(Χ)Ἱ + τ᾿ (7.10.2) 


for some constant values of δι, c,,.... But, since x@ (x), x¢,(x),..., and 
x;,—2(x) are polynomials of degree inferior to k, the two terms in the left-hand 
member of (7.10.2) are both orthogonal to ¢o, @,,..., ¢,- over [a, 6], 
relative to w(x). Hence the same statement applies to the right-hand member, 
so that the omitted terms in (7.10.2) vanish, and we deduce that ¢,(x) satisfies a 
recurrence formula of the form 


φι. i(X) = (Gx + b)O CX) + αφκ- αὐ (7.10.3) 


where a, is defined by (7.10.1), and δὲ and c, are certain other constants. In 
order that (7.10.3) also hold when k = 0, the convention @_,(x) = 0 may be 
adopted. 

If we multiply the equal members of (7.10.3) successively by WOx 415 
w@,, and w@,_,, and integrate each resultant equation over [a, b], we obtain 
the additional relations 


κι 


| Ϊ errr Cre (7.10.4) 


0 


a, [ xw(x)[O,(x)]? dx + b,y, (7.10.5) 


0= a, [ ΓΟ (7.10.6) 


If k is replaced by k — 1 in the first equation, the result can be used to eliminate 
the unknown integral from the third equation and to establish the relation 


= —— Hk (7.107 


Qy~1Vk-1 
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Hence (7.10.3) can be rewritten in the form 


- P(X) = Px+1(X) a Pr-1(X) βιφμ(χ) (7.10.8) 


Vk Ark αι --αὐκ- Any 
For the purpose of deriving an important consequence of (7.10.8), it is 
noted next that if both members of (7.10.8) are multiplied by $,(y), where y 
is an arbitrary parameter, and the result of interchanging x and y in the result is 
subtracted from that result, the constant b, is eliminated and the more sym- 
metrical relation 


(x — ἡ) POP) Pe COL) = POOP 1) 


Vk AY k 


ΝΕ φι(χ)φι.- .(.) -- Px-1 0) OLY) (7.10.9) 
ας --αὐκ-- 
is obtained. The result of summing the equal members from k = 0 ἴο αὶ =m 
and taking advantage of the “telescoping” of terms on the right is then the 
important relation 


S PDP) _ Pm+1 CD bul Y) = Pn IPm+1O) — (7,10,10) 
aM AnYm(X — Y) 


known as the Christoffel-Darboux identity. In addition, by considering the 
limiting form of that relation as y > x, we obtain the equation 


> a — = [bine 12) bm(X) -- Gin()bm+10)] (7.10.11) 
k=0 kK 


Age 1m 


(The usefulness of these relations is illustrated in Probs. 37 and 38 as well as in 
Sec. 8.4.) 

The fact that the set of the polynomials ¢,(x), which are orthogonal over 
[a, b] relative to w(x), satisfies a recurrence formula of form (7.10.3) is often 
useful for the purpose of generating members of the set recursively, without 
use of the generating function U,(x) and even without advance knowledge of 
the coefficients a,, b,, and ¢, for a specific w(x). For this purpose it is convenient 
first to determine the polynomials with unit leading coefficients (monic poly- 
nomials), so that 


A, = 1 (7.10.12) 


and hence also a, = 1, and then to normalize the polynomials thus obtained 
in another way if this is desirable. The kth such polynomial is denoted here 


by (x). 
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Thus, with 
po(x) = 1 
(7.10.3) gives 
oi(x) = x + bo 
where by is determined by (7.10.5) with k = 0: 
is [ὃ xw dx 
° (3 w dx 
Next, (7.10.3) gives 
2(x) = (x + bi), (x) + Cy 
where, from (7.10.5) and (7.10.6), 
b= _ [a xwdj dx ΡΝ [δ χνῷ, dx 
᾿ (2 wh? dx ; [ὃ w dx 
In general, ¢,,,(x) is obtained from b,(x) and ¢,_ ,(x) by use of (7.10.3) with 
b 2 b 
a4,=1 b= Be a dx g, = --ἰερῴε- θὲ ἀχ χα. αι & (7.10.13) 
Sa whi dx Sa νῷ... dx 


As was shown in Sec. 1.8, the fact that the polynomials ¢o,..., φ, 
satisfy a three-term recurrence formula also can be used to simplify the numerical 
evaluation of a linear combination of them for a specific value of x. For this 
purpose, we identify (1.8.7) with (7.10.3) and obtain 


a, = —(a,x + δὼ) βι = -- οὔ 


Hence, from the results of Sec. 1.8, we deduce that if the sequence u,(X), 
“,--1(Χ), ..., Uo(X) is generated by the recurrence formula 


{ἐκ ΞΞ (a,x + by us 4 + Cy 4 1Up42 + C, (7.10.14) 
fork =n,n—1,..., 0, with 
Un+1 = U,+2 -- 0 (7.10.15) 
then there follows 
ΕΣ Οἰφι(Χ) = ho(x)uo(x) + [φ,() -- (αοχ + bo)Po(x) Jus(x) (17.10.16) 


In particular, if (7.10.3) holds when k = 0, with ¢_,(x) = 0, the right-hand 
side of (7.10.16) reduces to do(x)up(x). (This situation could always be made 
to exist by suitably defining a) and/or bo, but this is not always convenient.) 
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As an illustration, we consider the Chebyshev polynomials of Sec. 7.9, 
noticing that they are somewhat exceptional in that whereas they satisfy (7.9.1 1) 


T4104) = 2xT,(x) — T,-1) 
fork = 1,2,..., so that 


qa=2 b=0 qg=-1 €&=1,2...) 


the recurrence formula must take the form T,(x) = xTo(x) when k = 0. Thus, 
if (for convenience) we also define 


Ag = 2 bo = 0 Co = —1 


the recurrence formula is not satisfied when k = 0 with T_,(x) = 0, in accord- 
ance with which the coefficient of u,(x) in (7.10.16) then does not vanish but 
becomes —x. Thus we deduce that 


= C,T,(x) = u(x) — xu,(x) (7.10.17) 
where 
u(x) = 2xuUys (x) — Up42(x) + C, (7.10.18) 
fork = n,n — 1,..., 0, with 
Une 1(X) = πι,..,Χ) = 0 (7.10.19) 


This Chebyshev algorithm is associated with the name of Clenshaw [1955]. 

Before proceeding to a consideration of the use of orthogonal polynomials 
when discrete data are involved, it is desirable to establish certain analogies 
between integration and summation and to obtain certain special properties 
of the binomial coefficient functions and related functions, which play the same 
roles in summation and differencing as do the functions 1, x, ..., x” in integra- 
tion and differentiation. 


7.11 Factorial Power Functions and Summation Formulas 


The product s(s — 1):-:(s — πα + 1), where n is a positive integer, is often 
called the factorial nth power of s, and the notation 


s™ = s(s—1)---(8s —n4+ 1) (7.11.1) 


is frequently used. It is related to the binomial coefficient function by the equation 


(n) 
(*) τοῦ 19) 
n n! 


LEAST-SQUARES POLYNOMIAL APPROXIMATION 345 


In the more general case when n need not be a positive integer, (7.11.1) is 
generalized by the definition 
om I(s + 1) 


“i (7.11.3) 


in accordance with which there follows, in particular, 
5) = 1 (7.11.4) 
CU BOO 0ὅὕὕ0Ὁ. as 
s+1 (s + 1)(s + 2) 
and so forth. 


In order to establish the usefulness of the notation (7.11.1) or (7.11.3), 
we notice that, from (7.11.3), there follows 


ΙΕ) Ist) 

I(is—-n+2) Γύῴύ -- π Ὁ 1) 

-( 25: -ἢ T(s + 1) 

s—-n+1 I(s — n + 1) 

Ξ I(s + 1) ἘΝ I(s + 1) 
(s—n+ DI(s —n + 1) I(s — n + 2) 


A,s™ = 


or 
Ais = ns®-) (7.11.6) 


where A, denotes the forward-difference operator with unit spacing, acting on s, 
and where use is made of the fundamental property of the gamma function,+ 


Tut) =u) (7.11.7) 


Thus the factorial power s™ is related to the operator A, just as the ordinary 
power x” is related to the operator D = d/dx. In this connection, it is of 
interest to notice that Newton’s forward-difference formula (4.3.5), with an 
error term, can be written in the form 


2 n (n+1) 
fr = fy + Mo sy 4 ΔΆ er gg Mo gem 5 FPO κε (7118) 
1! 2! n! (n + 1)! 


and is seen to be completely analogous to the Maclaurin expansion, with a 
remainder, expressed in the form 


_ f'() FOG pecs ee FO) gh OOO) ead 
F(x) = f(0) + πῆ δὰ aaa + = se ἜΤ ὦ 


} It is often convenient to write u! as an abbreviation for Γίω + 1) even though u is 
not a positive integer. 
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We recall next that, from the telescoping of terms in the expansion 


N 


> Ah = Susi — Su) + Gute -- Sua) ἘΠ΄" 


s=M 
+ (fy — fy-1) + wai — fn) 


it follows that 
N 


> Ai f, = ίνε: — Su Ξε ΔΙ᾿ ὦ (7.11.9) 


s=M 


This general relation is seen to be analogous to the relation 


b 
Ϊ “Ὁ dx = f(b) -- fla) = fe 


but careful notice should be taken of the fact that the limits on the right in 
(7.11.9) are not the same as those on the left. Thus, in particular, we may 
deduce from (7.11.9) and (7.11.6) the summation formula 


(nt+1)|N+1 
ore? 


ie) 7.11.10 
s=M n+ 1 ( -_ ) 


Μ 
which clearly corresponds to the integral formula 


b n+1 
x" dx = Ἂν 
Ξ n+ 1 


Since (7.11.8) permits any polynomial in s to be expressed as a linear 
combination of factorial powers of s, (7.11.10) then serves to effect the sum- 
mation of that polynomial. In illustration, in order to express the sum 


5, Ξ 1.3 2:4 -.- τ π(ι +2 


b 


(n # —1) 


a 


in closed form, we could first obtain the relation 
52 + 2s = 0 + 35 + s@ 


by use of undetermined coefficients or by using (7.11.8), and then make the 
calculation 


n 
S [35 + 52] = [$s + 450142 


s=1 
— 3(n + 1)n + H(n + n(n — 1) = n(n Ὁ 1)Ω͂ F 21) 


The summation could also be effected by making appropriate use of (5.8.4) 
or of the Euler-Maclaurin sum formula (5.8.12). 

There exist a large number of useful identities involving either factorial 
power functions or, correspondingly, binomial coefficient functions. (See, for 
example, Probs. 46 and 47.) In particular, the relation 


ss — n)® = στὴ (7.11.11) 


δὴ 
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which follows immediately from the fact that the left-hand member is given by 


[ss -- 1)τττ —n+ DIG -- nls -- πὶ -- 1). —n-k 4+ 1] 


is of frequent use. 
(") = ( ἡ (7.11.12) 
n m—n 


In addition to the property 
which follows immediately from the definition, many important relations are 
obtainable from the identity 


2) 32) S6 210) om 


when ἢ is a nonnegative integer. In order to establish this relation, we multiply 
together the series expansions of (1 + x)? and (1 + x)", noticing that those 
series terminate when p and m are nonnegative integers and are absolutely 
convergent infinite series when |x| < 1 otherwise, to obtain the results 


or SOE O01 2200)" 


But since also 
(1 + xy"t? = > a τ ᾿ x" (|x| < 1) 
n=0 ut 


and since the coefficient of x" must be the same in these two forms, the first 
form of the desired result (7.11.13) follows. The second form results from 
interchanging p and m. 

We may notice next that 


(ἢ .( ρᾺ- --Ἠἡ -(-ρ - 4 Ὁ) 
4 q! 


= a ΡῈ 4- 9: 


Γ᾿ = (-1) (” ὡς ἥΣ ‘ (7.11.14) 
q q 


and hence 
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when 4 is a nonnegative integer. Hence we deduce from (7.11.13) and (7 11.14) 
the further relations 


Cr ore Ley 


a . __qynte ptn—k—1\(/m 
=> ( 1) ( SE +4 (7.11.15) 


k=0 
All these formulas can be expressed alternatively in terms of factorial power 
functions by making use of (7.11.2). 
Finally, a general formula which will be needed in the sequel is the sum 
analogy to integration by parts. In order to derive the desired formula, we 
notice first that 


Aju, = Us410s41 — UV, = Vs41(Us41 — 4.) + U<Vs41 — v5) 
= U, Ayv, + Us41 Ais 


Hence, by transposition and use of (7.11.9), we deduce a formula for sum- 
mation by parts in the form 


N N 
Su, Ayo, = usvlN? — DS vee, Ars (7.11.16) 
s=M s=M 


Also, by replacing u, by v, and v, by u,_,, and transposing terms, we deduce 
an alternative form 


N N 
u. A,v, = Us-10sl4 ἢ -- v, Δικ... (7.11.17) 
δ᾽, 1 =e 


s=M 5 


7.12 Polynomials Orthogonal over Discrete Sets of Points 


For least-squares approximation to a function F(x) over a discrete point set 
Sy, it is convenient to make use of a set of polynomials which are mutually 
orthogonal under summation over Sy with respect to a specified weighting 
function. We suppose that N + 1 points are to be employed in the approxima- 
tion, with uniform separation ἢ, and that the extreme points are at the ends of 
the interval [a, b], where b — a = Nh. If we then write 


x=at+sh (7.12.1) 
the variable s takes on the values s = 0,1, 2,..., N at those points, and 


we seek a set of polynomials ¢9(s, Ν), 61(s, N),---> Pls, N) such that @, 
is of degree r in s, and such that 


Σ γν()φΦ,(5, ΝῚᾳ4,...(5) = 0 (7.12.2) 


s=0 
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where w(s) is a specified weighting function, assumed to be nonnegative on Sy, 
and where q,_,(s) is an arbitrary polynomial of degree r — 1 or less. 
The procedure is analogous to that employed in Sec. 7.5. We first set 


w(s)o,(s, N) = Δί ὕ,(56, N) (7.12.3) 
so that (7.12.2) becomes 


Σ [AiU,(s, N)]q,-1(s) =0 (712.4) 
s=0 


and sum by parts r times, noticing that Ajg,_,(s) = 0, to transform (7.12.4) 
to the relation 


[{Ai *U,(s)}9,-1(s) — {Δὲ 7UAs sa 1)} Ai9,r—1(S) Sr 
+ (-1) {Us + r — 1)} δι 'g,_(s)}EXt! = 0 (712.5) 


Since we require that ¢, be a polynomial of degree r, it follows from (7.12.3) 
that U, must satisfy the difference equation 


Art Es A U,(s, ")| =0 (7126) 
w(s) 
on Sy, and that (7.12.5) will be satisfied for arbitrary values of g,_ ,(s), A,q,- (5), 
and so forth, when s=0O and when s=N+1 if U(st+r-— 1, NV), 
A,U(s +r—2,N),..., and Δι 'U(s, N) vanish when s = 0 and when 
s = N + 1. It is easily seen that these requirements are equivalent to the 2r 
conditions 


U,0, N) = U0, N) = U,2, N) =-++ = Ur —1,N) = 0 
(7.12.7) 

and 
ΑΝ + 1, N) = O(N + 2, N) = O(N + 3,N) =+++ = U(N + r,N)=0 
(7.12.8) 


Once U, has been determined, necessarily with an arbitrary multiplicative 
constant, there follows, from (7.12.3), 


$(s,N) -- - MU Ys, N) (1129) 
w(s) 


In consequence of the results of Sec. 7.2, the coefficients in the relation 


y= > aos, N) (7.12.10) 
r=0 


are then determined by the requirement 


N 


> w(s) Fe + sh) — Σ α,φ (5, Ν}} τ, min (7.12.11) 


s=0 r=0 
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in the form 


ΓΞ 7 Σ w(s)F(a + sh)o,(s; N) (Π.12.12) 


where 


N 
Y(N) = 2 w(s)or(s, N) (7.12.13) 


7.13 Gram Approximation 


We restrict attention here to the case when w(s) = 1. Equation (7.12.6) then 
requires merely that U,(s, N) be a polynomial in s of degree 2r and, since 
(7.12.7) and (7.12.8) determine its 2r zeros, there follows immediately 
U(s, N) = Cy[s(s — 1) τ —r t+ DI 
x [(s-N-lIe-N-2)°°°G-N- r)| 
or 
U(s, N) = Cyys(s — N — 1) (7.13.1) 

where C.y is an arbitrary constant. Hence we have also 

φ,(5, N) = Cy Ai[s(s — N— DP] (7.13.2) 


In order to express this result in a more explicit form, we first expand 
(s — N — 1) in terms of factorial powers of (s — r), then use (7.11.11) to 
express (7.13.1) in terms of factorial powers of s, and, finally, make use of 
(7.11.6). Thus, if we use the second form of (7.11.15), we obtain 


(s-N—-—1)%=r CS One 


(—1)'r! > (-1} (* : at ᾿ ; (7.13.3) 
k=0 


and hence, by virtue of (7.13.1) and (7.11.11), 


r _ k at 
Us, N) = (—1)r! C,y {Ξ ι ᾿ s&+r) (7.13.4) 
r— 


ΖΞ, k! 
Thus, by making use of (7.11.6), we obtain the result 
(r) = 
‘Hen δ Σ νοι αἱ ~ o = ὴ s® (7.13.5) 
rear r—k 
which can be transformed to the more convenient form 


Lr + kK) s® 


a ae (7.13.6) 


φι(5, ΝῸ = ον > (-ὉὮ 


where c,y has been written for the arbitrary constant (—1Jr! N©C,y. The 
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expanded form appears as follows: 
rr+ ἢ 5. (ὁ — irr + 1) + 2) 50 -- ἡ 
(s,N) Ξ ον [1 -- a ἐν ες αὐτόν τ δ 
eA) “| (i)? N (212 NN «ἢ 


ὦ -- 2. -- Drv + DG + 2) + 3) s(s — )6 — 2) Ἵν 
(3!)? NWN — 1)(N — 2) 


(7.13.7) 


In most applications of least-squares methods, it is convenient to make 
use of an odd number of ordinates, so that N is even, and to write 


N=2M 
In such cases, it is also convenient to make the change of variables 
s=M+t (7.13.8) 


so that ¢ represents distance from the midpoint of the set Sy in units of the 
spacing / (see Fig. 7.1) and takes on the values 0, +1, +2,..., ἘΜ at the 


FIGURE 7.1 a h b 


2M + 1 points of Sy. If also we choose 
Cy =(-1) (7.13.9) 


and write p,(t,2M) = ¢,(s,2M), the polynomials of degrees zero through 
five can be expressed explicitly as follows: 


Polt, 2M ) = 1 
ΐ 
Pilt, 2M) = M 


3t7 — M(M + 1 
peo ae 
MQM — 1) 
a 2 " 
νείι,2Μ) = 3 - GM" + 3M - It (7.13.10) 


M(M — 1)(2M — 1) 
p(t, 2M) = 35t* — S(6M” + 6M — 5)t? + 3M(M? — 1)(M + 2) 
2M(M — 1)(2M — 1)(2M — ἢ) 
p(t, 2M) = 
631" — 35(2M? + 2M — 3)t®? + (15M* + 30M? — 35M2 — 50M + 12)t 
2M(M — 1)(M — 2ΩΜ — 1ΩΜ — 3) 
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These polynomials thus possess the orthogonality property 
M 
2 pit, 2M )p f(t, 2M ) as 0 (i # j) 
Mies 


and are usually known as Gram polynomials (or Chebyshev polynomials, 
although the latter name is usually reserved for either the polynomials con- 
sidered in Sec. 7.9 or those to be considered in Sec. 8.14). It may be seen that 
the rth polynomial is an even function of ¢ when r is even, and an odd function 
of t when ris odd. Also, each polynomial takes on the value unity when t = M. 
Further, if 1 is replaced by Mx and M is then increased without limit, it can be 
verified that p,(t, 2M) tends to the rth Legendre polynomial Px): 

lim p.(Mx, 2M) = Px) 

Μ- ὦ 

In accordance with the results of the preceding section, the nth-degree 

least-squares polynomial approximation to the function 


f(t) = Fi4@ + ὃ) + ἢ (7.13.11) 


over the (2M + 1)-point set t= —M, -M + 1,..., -1,0,1,...,M — 1, 
M is given by 


y= > 4,p,(t, 2M) (7.13.12) 
r=0 
where 
a, = f(t)pt, 2M) (7.13.13) 
Yr t=-M 
and 
M 
= >. P(t, 2M) (7.13.14) 
t=—M 


As in the earlier developments, the factors y, are independent of the function 
f(t) which is to be approximated and can be calculated once and for all. 

It should be noted that various conventions are adopted in the literature 
with regard to the value assigned to the arbitrary multiplicative constant C,y 
in (7.13.7) in the general case. In particular, that constant sometimes is so 
defined that the coefficient of 5" in ¢,(s, N) is unity. Another choice is that for 
which the values taken on when s = 0, 1,..., N are integers without a com- 
mon factor, so that tabulation is simplified. When N + 1 points are used, the 
sum of the squares of the N + 1 tabular values of the rth-degree polynomial 
corresponding to the normalization (7.13.9) used here is found to be 


_W+rt dW -- ἢ)! 


γ, ὍΣ. IND? (7.13.15) 
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(see Prob. 53), whereas (7.13.7) shows that the leading coefficient in $,(s, N) is 


_ (2r)!(N -- r)! 


A, 
ΟΝ! 


(7.13.16) 


These results permit tabulations relevant to other normalizations to be inter- 
preted in terms of the one used here. 


7.14 Example: Five-Point Least-Squares Approximation 


In order to illustrate a method of using the preceding results, we consider here 
the case in which only five ordinates are used, so that M = 2. The relevant 
orthogonal polynomials of degrees zero through four are then obtained from 
(7.13.10) in the forms 


Pot) = 1 Py(t) = 4 
pt) = Xt? — 2) p(t) = 465t? -- 174) (7.14.1) 
pit) = τ (3515 — 155t? + 72) 


if we write p,(t) for p,(t, 4). 

We may notice that p,(t), as defined by (7.13.10), is nonexistent when 
M = 2. This situation corresponds to the fact that the use of polynomials of 
degrees zero through five over five points would not lead to a determinate 
problem, since infinitely many fifth-degree polynomials would fit the data 
exactly at those points. Further, the use of polynomials of degrees zero through 
four over five points truly would not be a least-squares procedure since it 
necessarily would lead to the fourth-degree interpolation polynomial which 
fits the data exactly. Thus p,(t) is not needed when five points are used unless 
an exact fit at those points is desired, in which case the use of one of the methods 
given in earlier chapters usually is to be preferred. 

Values of the polynomials at the five relevant points may be tabulated as 
in Table 7.1. According to (7.13.12) to (7.13.14), the coefficient a, of each PAt) 


Table 7.1 
t Po Pi P2 P3 f 
τῷ 1 4 1 i Ml fe 
—1 1 —0.5 —0.5 2 ore 
0 1 0 —1 0 to 
1 1 0.5 —0.5 —2 Ii 
2 1 1 1 1 iD 
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used in the approximation is obtained by multiplying each entry in its column 
by the corresponding entry in the column of values of f(t), summing, and divid- 
ing the result by y,, which is listed at the foot of the p, column. Once the a’s 
are calculated, the least-squares polynomial y(t) can be obtained explicitly by 
forming the corresponding combination of the polynomials listed in (7.14.1). 
If only the value of y(t) at a tabular point is required, this explicit form of y(t) 
is not needed, since the required value is obtained by merely multiplying the 
tabulated value of each p, for that t by a,, and summing the results. 
In illustration, suppose that we are provided with the empirical data 


x 0.0 0.2 0.4 0.6 0.8 


F(x) 1.10 1.78 2.74 4.12 5.69 
together with some assurance that the observed values are in error by no more 
than a few units in the last place given, and that the true function 15 “smooth.” 
In order to obtain least-squares polynomial approximations by use of Table 7.1, 
we then set x = 0.4 + 0.21, or t = 5x — 2, and write F(0.4 + 0.21) = f(t). 
Calculation then gives 


Ay = 3.086 a, = 2.304 a, = 0.314 a, = —0.009 
so that least-squares approximations of degrees 1, 2, and 3 are obtained by 
retaining two, three, or four terms in the relation 


f(t) & 3.086po(t) + 2.304p,(t) + 0.314p,(t) — 0.009p,(t) 


The corresponding smoothed values at the tabular points may be obtained, 
from Table 7.1, as follows: 
t —2 -1 0 1 2 


ἢ 1.10 1.78 2.74 4.12 5.69 

V3 1.105 1.759 2.772 4.099 5.695 

72 1.096 1.777 2.712 4.081 5.704 

V1 0.782 1.934 3.086 4.238 5.390 
The RMS value of the five deviations from the observed values is found to be 
0.0198 for the third-degree approximation, 0.0235 for the second-degree 
approximation, and 0.264 for the linear approximation. The use of (7.3.34) 
then leads to corresponding estimates of 0.0443, 0.0372, and 0.341, respectively, 
for the RMS error in the observed values. Clearly, only the first two of these 
estimates are in accord with the given information. 

If the smaller of these estimates is accepted as the more appropriate one, 
we may conclude that the additional smoothing afforded by the use of a parabolic 
approximation, in place of a cubic, probably represents a further removal of 
“noise” rather than a departure from the unknown true function. The use of a 
linear approximation could not be so justified. 
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If additional values of the least-squares polynomial are desired, they 
may be obtained conveniently by interpolation. However, if the equation 
of the parabolic approximation is required, it may be written down in the form 


y = 3.086p9(t) + 2.304p,(t) + 0.314p,(t) 
= 2.772 + 1.152t + 0.157t? 


and reduced, if so desired, to the form 
y = 1.096 + 2.620x + 3.925x? 


In particular, this result supplies the approximations 2.62, 5.76, and 8.90 to 
the slope of the unknown function at x = 0.0, 0.4, and 0.8, respectively, whereas 
the third-degree approximation would yield the values 2.30, 5.89, and 8.58. 
On the other hand, the result of differentiating the fourth-degree interpolation 
polynomial, which takes on the five observed values exactly, would give the 
respective values 3.40, 5.89, and 7.48. 

By expressing the a’s explicitly in terms of the observed values, we may 
obtain formulas which express the smoothed values directly in terms of the 
observed ones. Thus, corresponding to the third-degree least-squares approxima- 
tion over five points, we obtain the formula 


Yo=% -- 4, =3F-2.+f-i th +h th) 
- 305.) Bf 4 Jy - ζῇ Ὁ) 
or 
Yo = as(—3f-2. + 1L2f_, + 17f) + 12f, -- 32) (7.14.2) 

for the smoothed value at the midpoint ¢ = 0, and the formulas 
= 79(69f_2 + 4f-1 — Go + 46, -- fr) 
yoy = g3(2f-2 + 27f-, + 12fo — Bf, + 22) 

γι = HsQf_2 — 8f-1 + 1200 + 27, + Hf) 

Yo = Fol—S-2 + 46-1 -- Yo + i + 695) 


are obtained in a similar way. It is of interest to notice that these formulas can 
also be expressed in the compact forms 


Y-2 =f-2 — #56 γι = f-1 + %6°f 
Yo = ho - τι δ (7.14.4) 
γι ΞΕ t+ τι δῖ yo Ξ ὦ - #55°fo 


> 
Ὁ 
| 


(7.14.3) 
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The simplicity of these last forms is due to the fact that the degree of the 
least-squares polynomial is exactly one less than the degree of the polynomial 
which would be uniquely determined by the five data. In the cases when this 
difference exceeds unity, the formulas are less simply expressed in terms of 
differences, particularly for off-center points, as will be seen. 

Explicit formulas which avoid the necessity of effecting the summations 
may also be obtained by first resolving the relations (7.14.1) in the forms 

1 = po(t) t = 2p,(t) 
t? = 2po(t) + 2p.(t) τ = 4[34pi(t) + 6p3(¢)] (7.14.5) 
t* = J,[238po(t) + 310p2(t) + 12p,(2)] 
Now the interpolation polynomial (of degree 4) which agrees exactly with 
f(t) when t = 0, +1, and +2 can be expressed in the Stirling form (Sec. 4.5) 
fo + (udfo)t + Gd7fo)t? + ud*fo(t? — ἡ + δι) — #7) (7.14.6) 
and hence, by introducing (7.14.5) into (7.14.6), we obtain the relation 
f(t) = ὦ + fo + 46*fo)polt) + (ufo + $uO°fo)Pilt) 
+ (δ + 25*fo)p2(t) + Gud*fo)ps(t) + GFod*fo)p4(t) (7.14.7) 
when ¢ is restricted to the values 0, +1, and +2. For other values of ¢, the 
right-hand member represents the fourth-degree interpolation polynomial which 


coincides with f(t) at those five points. The associated error E(t) can be ex- 
pressed in the familiar form 


EG) = = (t? — 1027 -—AP@ (el < 2) 


when f‘(t) exists and is continuous for —2 St S 2. 

Since the right-hand member of (7.14.7) accordingly is the polynomial 
which would be afforded by fourth-degree least squares, over the five points 
involved, and since the coefficients of the p’s are independent of the number of 
p’s retained, it follows that the third-degree least-squares polynomial relative to 
those points is then obtained by deleting the term involving p,(¢). In particular, 
when attention is restricted to the five points themselves, the resultant formula 
can be expressed in the form 

wt) = f(t) — God*fo)palt) = (@ = 0, 41, £2) (7.14.8) 
in accordance with (7.14.4). Similarly, the first-degree least-squares polynomial 
relevant to five points may be obtained by retaining only po(t) and p,(t) in 
the right-hand member of (7.14.7). 

The methods of this section are readily generalized to cases in which more 
than five points are used in the least-squares calculation. 
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7.15 Smoothing Formulas 


In place of approximating f(t) by a single least-squares polynomial of degree n 
over the entire range of an extensive tabulation, it is frequently desirable to 
replace each tabulated value by the value taken on by a least-squares polynomial 
of degree n relevant to a subrange of 2M + 1 points centered, if possible, at the 
point for which the entry is to be modified. Thus, except for points near the ends 
of the range of tabulation, each smoothed value is obtained from a distinct 
least-squares polynomial. In this section we list certain sets of smoothing 
formulas which are obtainable for this purpose by the methods of the preceding 
section. 


For first-degree least-squares approximation relevant to three points, the 
formulas are of the form 


Yar = ἰδ... + 20 - Si) =f = 45°fo 
Yo = 3F-1 + fo +f) =fo + δῷ (7.15.1) 
W=f1+ 2+ Si) =h -- ἰδ 


whereas the formulas relevant to five points are 


)-χ, = $03f-2 + 2f-1 + fo — fr) 
Yi1 = Yo(4f-2 + 3f-1 + 2.0 Ὁ 2) (7.15.2) 
Yo = 8f-2 + f-1+fot+hA +h) 


where the omitted formulas for y, and y, are obtained from the formulas for 
y-, and y_, by reversing the numbering of the ordinates. Thus, for example, 
if first-degree five-point least squares were to be used, the central formula would 
be used for all values except the first two and the last two, for which the off- 
center formulas would be used. 

The formulas for third-degree five-point least squares were obtained in 
the preceding section and are listed again, for convenient reference, in the forms 


Y-2 = 7ι(69Γ.; + 4-1 -- ὁ + 44, -- 3) Hf. -- το δ 
)-ι = ττ(25.-.} + 2717.-.-,  Ἰ2ῷ -- 8f, + 22) =f, + δῇ (7.15.3) 
Yo = 3s(—3f_2 + 12f-, 1700 + 12, -- 32) =h -- τι δ 


δδοοο 969 96.660 6ὁ 9ς69Ὁὃϑ ὁ" 5656 5586 55 ε δοοοονδονονοοθδο ὁ α 6 α6οὸ δ 9 δ. 
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whereas the corresponding seven-point formulas are 


y-3 = a2x39f_3 + 8f-2 — 4-1 -- So th + 4h -- 4s) 

V2 = χγίδή.9 + 197.) + 16.., + Ho — 4h, — Th + Ys) 

Yai = ax(—-4f_-3 + 16f-2 + 19f-1 + 120 + 2h -- 4h + fs) 
Yo = ai1(—2f_3 + 3f-2 + Of-1 + Ho + OH, + 3{ — 275) 


rr i ἃ δ ὁ 9" “5 ὁ ὁ ὁ ὁ 5" "ὁ ὁ ὁ ὁ κν 9 ὁ" ε" εκ κα ὁ ὁ ο ὁ 9 


(7.15.4) 


Finally, the fifth-degree seven-point least-squares formulas may be listed as 
follows: 


Y—3 = gxq(923f_3 + Of-2 — 15f-1 + 20fo -- I5f, + 2 — fs) 
= f_3— 5340 fo 
y-2 = rsalf_3 + [48{. + 15f-1 — 20f0 + 15f, -- Of2 + fs) 
= f_. + τ δῷ 
yea = τς τις Ὁ 307., + 233f-, + 100f, (7.15.5) 
— T3f, + 30f. — 533) 
= fi, - 3830 70 
Yo = a3r(V_3 — 30f-2 + 75-1 + 13170 + 756. — 30f2 + 575) 
= fo + χξτδο 


or @ @ @ © @ @ @ ὁ κ ὃ © ὁ 9 ee ὁ Φ He ὁ ὁ ὁ ὁ ὁ ὁ ὁ ὁ © Κὁ ὁ 9 Fe 9 FS © ὁ ὁ 9 ὁ Fe ὁ ὁ ὁ ὁ ὁ ὁ ὁ ὁ ὁ 65 4 «Α α ee 4 


The use of an nth-degree least-squares polynomial relevant to 2M + 1 
points essentially assumes that the true function can be approximated by 
some vth-degree polynomial over each subrange of 2M + 1 points, but it 
admits the possibility that no single nth-degree polynomial may be satisfactory 
over the entire range. The amount of smoothing increases with the number of 
points used in the smoothing formula and decreases with increasing values of 
the degree n. 

It is often desirable and convenient to employ a smoothing technique 
involving a relatively small number of points, so that the relevant formulas are 
of simple form, and to iterate the process as many times as appears to be 
desirable. The degree n is chosen to be as small as possible, in consistency with 
the assumption that differences of the true function, of order higher than 7, are 
small. If such a process were iterated indefinitely, the sequence of smoothed 
functions would tend to the least-squares polynomial of degree n relevant to the 
entire range of tabulated values. The analyst can and generally must rely upon 
his judgment with regard to the stage at which the iteration is to be terminated, 
so that most of the noise is eliminated but essential characteristics of the function 
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are not appreciably modified. The choice of 7 is often dictated by the fact that 
the first n differences of the observed function f are fairly regular, whereas the 
(n + 1)th differences fluctuate erratically and have a mean value near zero. 

As an illustration, we consider the data listed in the second column of 
Table 7.2. A plot of the given data suggests that, whereas the true function is 
almost certainly not linear, it can be fairly approximated by a linear function 


Table 7.27 
5-point 5-point 
x F(x) once twice Spencer W andR 
0 431 402 405 419 
1 409 423 422 422 
2 429 444 439 435 
3 422 459 456 454 
4 530 469 472 473 
5 505 483 485 487 
6 459 504 499 496 
7 499 510 516 508 
ὃ 526 527 536 526 
9 563 554 557 550 
10 587 584 585 582 578 
11 595 612 616 614 610 
12 647 649 650 648 646 
13 669 683 684 682 685 
14 746 720 720 716 | 724 
15 760 756 752 749 758 
16 778 792 784 787 
17 828 810 815 812 
18 846 841 847 837 
19 836 876 880 868 
20 916 914 922 910 
21 956 960 966 961 
22 1014 1019 1012 1016 
23 1076 1061 1060 1069 
24 1134 1106 1107 1112 
25 1124 1152 1154 1141 


+ These data were taken from Spencer [1904] and have been analyzed in various 
ways by Spencer, by Whittaker and Robinson [1944], and others. 


over any subrange of, say, three or five points. The smoothed data given by the 
first-degree five-point formulas of (7.15.2) are listed in the third column of the 
table. Each smoothed value except the two values at each end of the tabulation 
is obtained very simply as the average of the five values centered at the point 
considered. Off-center formulas are used for those points. A second application 
of this process leads to the values listed in the fourth column and is represented 
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70) 


FIGURE 72 0 5 10. 15 2 2425” 


by a continuous curve in Fig. 7.2. A quantitative estimate of the degree of 
smoothing is afforded by the fact that the means of the absolute values of the 
second and third differences of the given data are 41 and 75, respectively, 
whereas the corresponding means for the results of the second smoothing are 
2.1 and 2.6, respectively. At the same time, it appears that the characteristic 
trend of the data is preserved in the smoothing. The results of applying the 
first-degree three-point formulas of (7.15.1) three times are found to be quite 
similar to the results of using the five-point formulas twice in the present example. 

In the fifth column of Table 7.2 are listed the results obtained by Spencer 
by use of an elaborate 21-point formula which yields smoothed values only at 
points which are more than 10 intervals away from the ends. The sixth column 
of the table lists results obtained by Whittaker and Robinson, by use of another 
21-point formula combined with an appreciable amount of auxiliary calculation 
relevant to the smoothing of the first and last 10 entries. Whereas the smoothed 
values generally do not differ appreciably from those obtained (much more 
simply) in the fourth column, the advantage in smoothness actually belongs to 
the results of the simpler method, in the sense that the mean absolute second 
and third differences relevant to the data of the sixth column are found to be 
5.2 and 3.4, respectively, as compared with 2.1 and 2.6 for the data of the fourth 
column. 

It should be emphasized, however, that the smallness of certain mean 
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absolute differences cannot in itself be taken as an indication of a satisfactory 
smoothing. By repeating the smoothing which led to column three indefinitely 
often, we eventually would be led to a “smoothed curve” which is represented 
by a straight line over the entire range, and hence for which all differences of 
order greater than one would vanish. This linear approximation would be 
obtained directly by use of first-degree 26-point formulas. 

It is conceivable, of course, that the deviation from linearity of the 
smoothed curve is still predominantly noise and that a much more drastic 
smoothing is indeed called for. It is at this point that the judgment of the 
analyst (or the weight of additional evidence) must be brought into play. 

As a further example, a plot of the data 


x 0 1 2 3 4 5 6 7 8 


f(x) 54 145 227 359 401 342 259 112 65 


(see Fig. 7.3) suggests that the true function can be approximated by a third- 
f(x) 
τὰ 
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degree polynomial over each subrange of five points. The use of the formulas 
of (7.15.3) yields the smoothed values 


x 0 1 2 3 4 5 6 7 8 


y(x) 57 134 244 348 393 352 242 124 62 


which are plotted and joined by a continuous curve in Fig. 7.3. 
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Whereas it is possible to determine a set of orthogonal polynomials 
over a discrete set of points relative to a specified weighting function w (see 
also Secs. 7.16 and 9.5) and derive corresponding smoothing formulas, a more 
convenient procedure which tends to accomplish about the same purpose, when 
w does not vary excessively, consists of applying the preceding smoothing 
formulas to the product wf and then dividing the result by w. More generally, 
the function f may be first transformed in an appropriate way to a new function 
g, and the new function g may be smoothed, after which the inverse transforma- 
tion may be applied to the smoothed function. In particular, in the case of the 
last preceding example, the graph of the function f(x) (Fig. 7.3) indicates a 
resemblance to a function of the form exp [—(Ax? + Bx + C)] and suggests 
that the smoothing be applied to log f(x), rather than to f(x) itself. 

Finally, it may be pointed out that the central smoothing formulas can be 
obtained rather simply without explicitly determining the least-squares poly- 
nomials involved. In this connection, we notice that the orthogonal poly- 
nomials p,(t, 2M) defined in Sec. 7.13 vanish at t = 0 when r is odd. Hence, 
if n is even, the central smoothing formula corresponding to a least-squares 
polynomial of degree n will be identical with the formula corresponding to that 
of degree n + 1. 

For n = 0 or n = 1, there follows merely yy = a) and hence, since 
Polt, 2M) = 1, 


M 
1 
Ss  - Ὸ τὸ Ξ 7.15.6 
γο Mai 2! (7.15.6) 


Thus, as in the special cases of (7.15.1) and (7.15.2), each smoothed value of 
fo is the average of the 2M + 1 values centered about fp. Forn = 2 0rn = 3, 
reference to (7.13.10) gives 


M+1 
—-qa,+a 0,2M) = ag -- ——— a 7.15.7 
Yo 0 2 P2l ) 0 M1] 2 ( ) 
where 
\ M 
a, ΞΞ —— : 7.15.8 
° 2M + i 24 or 
and 
1 1 2 
i 3γ. — M(M + IIIf, 7.15.9 
ἜΣ (Μ τ 215 (1.15.9) 
with 


M 
v2 = > pi(r,2M) (7.15.10) 
r=—-M 
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The value of γ2 is given by (7.13.15) with N = 2M andr = 2 


_ 2M + 1)2M + 2)(2M + 3) 
a 10M(2M — 1) 


and the insertion of (7.15.8), (7.15.9), and (7.15.11) into (7.15.7) leads im- 
mediately to the required formula 


γ; (7.15.11) 


Μ 


ΜΞ 3 2 Sic, ANG τ το 
= Gut DOM ΤῸ 2, [(3M? + 3M — 1) — 5.27. (7.15.12) 


which specializes to the central formulas of (7.15.3) and (7.15.4) when M = 2 

and M = 3, respectively. A similar analysis leads to the central smoothing 

formula 

᾿ 15 S 
4(44M* — 1)(4M? — 9\(2M + 5) Toon 


— 35M? — 50M + 12) — 35(2M? + 2M — 3)r? + 63r*]f, (7.15.13) 


Yo = [(5M* + 30M3 


relevant to fourth- or fifth-degree least-square approximation using 2M + |] 
points, which specializes to the central formula of (7.15.5) when M = 3.7 

As was pointed out earlier, the central smoothing formulas alone are 
generally useful only for smoothing values at points at least M intervals distant 
from the ends of the range of tabulation. However, they can be used throughout 
the entire range in the special cases when the true function is known to vanish 
outside the range of tabulation and to tend to zero smoothly as the ends of the 
range are approached from the interior, so that the zero values at exterior 
points can be used in smoothing values at interior points near the ends. 


7.16 Recursive Computation of Orthogonal Polynomials on 
Discrete Sets of Points 


Results analogous to those relevant to the Gram approximation can be obtained 
in other cases by similar methods. However, the determination of the associated 
orthogonal polynomials usually is more conveniently effected by recursive 
methods, particularly in those cases when the data points are not equally 
spaced. In this section, the necessary formulas are exhibited. 

Here we denote by ¢,(s) the rth member of the set of polynomials which 
are orthogonal relative to a certain positive weighting function w(s) over a 
discrete set Sy of N + 1 points, which are not necessarily uniformly spaced. 


} The central formulas (7.15.12) and (7.15.13) are written out explicitly for M = 10 
in Whittaker and Robinson [1944]. 
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Again we use the notation <u(s)> to represent the sum of u(s) over Sy. Then, 
by proceeding just as in Sec. 7.10, again we find that the orthogonal poly- 
nomials satisfy a recurrence formula of the form 


Pus (5) = (Gs + DOS) + οφκ--.() (7.16.1) 
where 


ee a (7.16.2) 


k 
and where A, is the coefficient of s* in ¢,(s). 
In complete analogy with the derivation of (7.10.5) and (7.10.6), it is 
found that b, and c, satisfy the relations 


bag <swoi? (7.16.3) 
<woi> 


and 

Sli 1 Pi> (7. 16.4) 
wx 1) 

As in the continuous case, it is convenient to first generate the polynomials 


Bo» Pis-- +>» Py» for each of which the leading coefficient A, is unity, so that also 
a, = 1 in (7.16.1), (7.16.3), and (7.16.4). Hence there follows 


G = - ἀκ 


$(s) = 1 
᾿ _ (ν) 
ᾧΦ,(6) =s+ bo bo Gs 
and then 
ᾧΦ,(5) = (s + by)b,(s) + cr ho(s) 
with 


" ζν φ a <sw,> 
b= -  Ξπ-- c= -- - -“ 
{νφ1) <w? 
and so forth. Afterward, each @,(s) can be scaled in whatever way is convenient. 
For the Gram polynomials defined in Sec. 7.13, with c,y = (—1)’, it 
can be shown that 


22k + ἢ) _N — Mk + N42) 
φιν 16) = CAEN (: *) φι0) — FEA NED teu) 


(7.16.5) 
and accordingly that 


22k +1) yy — kk + 2M + 1) 


( + DQM — k) + τ ῶμ τὔξ .16.6 
(k + 1)(2M — k) (k + ΟΜ — πώ (t) (7.16.6) 


Ρκει() = 
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7.17 Supplementary References 


Szego [1967] contains the most comprehensive treatments of orthogonal poly- 
nomials. See also P. J. Davis [1963] and the bibliography of Shohat, Hille, 
and Walsh [1940]. 

Least-squares methods are presented in the classical manner in Whittaker 
and Robinson [1944] and in a more general and more modern setting in Guest 
[1961]. See also Aitken [1932a], Birge and Weinberg [1947], Lewis [1947], 
and Hayes and Vickers [1951]. 

The recursive generation of orthogonal polynomials, together with 
associated error analysis, is dealt with in Forsythe [1957]. The Gram poly- 
nomials, orthogonal relative to a unit weighting function on a discrete set of 
equally spaced points, are tabulated by Anderson and Houseman [1942] and 
by De Lury [1950]. Other such tabulations are listed in the index by Fletcher 
et al. [1962]. 

For additional smoothing techniques, see Rhodes [1921], Whittaker and 
Robinson [1944], Sard [19486], Doodson [1950], Lanczos [1952], and 
Schcenberg [1952]. 


PROBLEMS 
Section 7.2 


I Show that the functions ¢9(x) = 1 and ¢,(x) = x are orthogonal under integra- 
tion over [—1, 1], and obtain the linear least-squares approximation yi(x) toa 
given function f(x) over [—1, 1] 


f(x) © γ.(Χ) = ag + ayx (-l1Sxs1) 


for which 


[-« - 9? ae = min 
τοῦ 


in the form 
1 
no) = 4] d+ 3x0 see ae 
-1 


Show also that the corresponding RMS error in [—1, 1] is given by 


1 1/2 
ΗΙ Pied = 13] 
--1 
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2 Show that the functions ¢9(x) = 1 and ¢,(x) = x are orthogonal under summa- 
tion over the abscissas x» = —1, x, = 0, and x, = 1, and obtain the linear 
least-squares approximation y,(x) to a given function f(x) over [—1, 1] 


f(x) νὰ yo(x) = Ag + Ax = (-1 8x81) 


for which 


Mp 


: [f(x,) -- yo(x,) ? = min 


in the form 
yo(x) = 2[(2 — 3x)f(-1) + 2f() + (2 + 3x) f()] 


Show also that the corresponding RMS error over the three relevant points is 
given by 


2 1/2 
[3 > Year - 4 - sat] 


3. If y,(x) is the linear approximation to f(x) obtained in Prob. 1, and if f’(x) is 
continuous on [—1, 1], show that 


1 
fx) — 71) = g(x, s)f"(s) ds 


—1 
where 


g(x, 5) = (x -- 5). -- 2 [ ( — s),(1 + 3xt) dt 
-1 


x) 


_ {-40 + s)*(1 — 2x + sx) (s 
= | ᾿ 


-1(ᾳ -- 5),Δ + 2χ + sx) ὦ 


IV ΠΛ 


Show also that g,(x, 5) is of constant sign for x and s in [--ἰ, 1] if and only if 
|x| < 4 or |x| = 1, and establish the relation 


f(x) = yi) — ἰα — 3ζ2ἕάξῷωΩὀ (|x| S ξ or |x] = 1) 
where ~1 < ὅ < 1, showing, in particular, that 


f(-1) -- γ͵α -Ὺἢ = 4f’E1) ῸΟ — ¥1O) = -δἰ(ζ2) 
70 -- γιῶῷ = (3) 
4 If y,(x) is the linear approximation to f(x) obtained in Prob. 2, and if f”(x) is 
continuous on [{--1, 1], show that 
1 


fix) — y(x) = Ϊ 


g2(x, 5){ (6) ds 
1 


where 
92x, 5) = (x - Op i 4(—s), -- ei — 5) 2 + 3x) 
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Show also that g(x, 5) is of constant sign for x and s in [—1, 1] if and only if 
|x| < ξ or |x| = 1, and establish the relation 


70) — yo(x) = -ἰὦ — 3x2) f"(E) (μι S For |x] = ὃ 


where —1 < & «1. 
5 If y3(x) is the second-degree polynomial which agrees exactly with f(x) when 
x = —1, 0, and 1, and if y(x) is the linear approximation of Prob. 2, show that 


y2(x) = y3(x) + (2 — 3x7) δ2.(0) 


where 6 is the central-difference operator with unit spacing. In particular, show 


that 
f(-1) - »χ( -1ὴ = —4[/0) — y2.00)] = fC) -- »»(ὐ = δ 
that the RMS error over the three points x = —1, 0, and 1 is 
1 
—~ |6*F0)| 
3/2 
and that 


y2(x) = f(x) + 42 — 3x7) f(x) 


(for all values of x) if f(x) is a polynomial of degree 2 or less. 


Section 7.3 


6 If the right-hand member of the rth normal equation associated with (7.3.4) is 
denoted by v, (r = 0, 1,..., ”), show that the weighted sum of the squares of the 
N + 1 residuals is given by 


N n 
> wad Lead? -- 2 0,0, 


i=0O 


and use this relation to calculate that sum for the numerical example of Sec. 7.3. 
7 Suppose that the following empirical data are available: 


x 1.36 1.49 1.73 1.81 1.95 2.16 2.28 2.48 


f(x) 14.094 15.069 16.844 17.378 18.435 19.949 20.963 22.495 


Determine least-squares polynomial approximations y,(x) and y,(x) of degrees 1 
and 2, respectively, weighting all data equally, and calculate the RMS value of the 
2ight residuals in both cases. 


Section 7.4 


8 Obtain estimated values of the RMS departure between the unknown true 
function f(x) and the observed function f(x) in Prob. 7, based on the approxima- 
rons y,(x) and y2(x); and also determine the approximate RMS errors in the 
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calculated coefficients involved in those approximations. Would either (or both) 
of the approximations be acceptable if it were known, independently, that the 
RMS value of the observational errors is about 0.04? 

The equations 


2.17x, + 0.86x, + 1.17x3 = 3.85 
1.06x, + 2.81x, — 1.21x3 = 3.03 
1.91x, — 1.02x, + 3.91x3 = 4.85 
1.07x, + 1.21x, + 1.06x3 = 3.27 


are based on empirical data, and it is known that the RMS errors in the coefficients 
are each about 0.015 and the RMS errors in the right-hand members are each 
about 0.008. Use the method of least squares to obtain approximate values of x, 
ΧΩ, and x3, and give ranges within which each unknown lies with a probability of 
about 0.9. (See Table 1.4 in Sec. 1.7.) 


Section 7.5 


10 


11 


12 


With the notation of Sec. 7.5, show that 
1 8 U,x){() dx 
r!A,  §f® U(x) dx 


a, = 


if f = d"f/dx' exists on [a, δ] and is continuous. 
If w(x) = (x — αὐ — x), verify that 


U(x) = C(x -- αὐ Ὁ -- xy 
satisfies the conditions of Sec. 7.5 when C, is a constant ifa > —landf> —1. 
If w(x) = x and [a,b] = [0,1], show that the rth orthogonal polynomial is 


given by 


b(x) = Cx7? a Hd -- x"] 


and that the arbitrary normalization ¢,(0) = 1 requires that 


_ 1 
"(r+ 1)! 
Determine the polynomials of degrees 0 to 4, and prove that 
(2r + 1)! 1 


an 2) ee Ὁ Ὁ 


in consequence of the relation 


ΓΦ) Γ(4) 


1 
[ χρ ἿᾺ -- x) dx = 
0 I(p τ 4) 


13 
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Use the results of Prob. 12 to show that the nth-degree least-squares polynomial 
approximation f(x) over [0, 1], relevant to the weighting function w(x) = x, is 
defined by 


yx) = > αφοὺ 


where 


= r+ 1)? Ϊ οὐδ δι 
[ 


and where ¢,(x) is defined in Prob. 12. In particular, show that the linear 
approximation is of the form 


γι) = 6 [ [(3¢ -- 4t7) -- 2(2t — 3t?)x] f(t) dt 
0 


Section 7.6 


14 


15 


16 


17 


By expanding (x? — 1)’ in descending powers of x?, and appropriately differen- 
tiating term by term, show that (7.6.6) implies the relation 


_ VS ent__@r = 2}. prea 
ἘΞ δ): 2 rk! (r — blr — 2k)! 


where the series terminates when k = r/2 if r is even and when k = (r — 1)/2 if 
ris odd. 
Show that the coefficient of P(x) in (7.6.11) can be expressed in the form 


_2rt+i 


sh retains ! 


[. ( = x2) f(x) de 

i? f(x) exists on [—1, 1] and is continuous. 

Show that the leading terms in the Legendre expansion of ἡ (x) = cos (2x/2) over 
[—1, 1] are of the form 


to P,(x) -- τ (12 -- π2)Ρ,(Χ) + as (xn* — 180x? + 1680)P,(x) — 
2 π n° n> 


Compare numerically the approximations to f(x) = cos (zx/2) afforded by the 
least-squares polynomials of degrees 2 and 4, obtained in Prob. 16, with the 
approximating polynomials of corresponding degrees afforded by truncated 
power series and by fitting f(x) exactly at three and at five equally spaced points in 


[-1, 1]. 
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18 If f(x) = [ἃ + 1)/2]'7, show that the coefficient of P,(x) in the Legendre 
expansion of f(x) over [—1, 1] is given by 


2r+1 ἢν d® (x + 1" 
i ra. 4 — ἌΓΟΙ =) dx 


rs 1/2 
= ar +1 "yd — s) ds" ds 
! 0 ds" 


r! 
2 


— {- ΓΤῚ 
ae (ὧν — 1)(2r + 3) 


so that 


x+i\?? 2 2 2 
= = P(x) + —— P,(x) -- —— P2(x 
( "ἢ = Po(x) + = Pax) -- 5 Pa) 


+ τῆς Pas) — τ! (Ix| < 1) 


19 Assuming the results of Prob. 18, compare the least-squares polynomial approx- 
imations to f(x) = [α + 1)/2]'/? of degrees 2 and 4 over [—1, 1] with the 
corresponding results of truncating power series and with the polynomials of 
degrees 2 and 4 which agree with f(x) at three and at five equally spaced points in 


[—1, 1]. 


20 Obtain the expansion 
|x] = 4Po(x) + $Po(x) — YeP,(x) + iisP.(x) — "τ" ({x| <1 


and compare the least-squares approximations of degrees 2 and 4 with the corre- 
sponding polynomial approximations which agree exactly with f(x) at three and 
at five equally spaced points in [—1, 1]. 


Section 7.7 


21 By using the Leibnitz formula (3.3.11), show that (7.7.8) implies the relation 


2 (—1)* 2) ae ee pk (") 
ee Ζι (r — ΠῚ τ oP ) a ki 


22 Show that (7.7.15) can be expressed in the form 


_ (-ἰγα 
(rt)? 


if f(x) exists for all x = 0 and is continuous, and if f(x) and its first r derivatives 
are dominated by xT 2¢ ® as x — οὐ. Show also that 
—1)c" 
gee Ea. Ase) 
ri (a — ct 


r 


=I χε Pf (x) dx 
0 


23 


24 
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when f(x) = e°*, and that 
a, = (ΓΟ + DF (s > —1) 
α΄ Τὼ — r + 1) 
when f(x) = x*. 
If f(x) = ἢ — (/N)]* when 0 Ξ x S N and f(x) = 0 when x = N, obtain 
the leading terms of the expansion (7.7.16), with « = 1, in the form 


L,(x) 


N 
ἘΞ -α 14 
I) END 


Ν — 6 


= L ean > 0 
2(N + 2)(N + 3) 2x) + ae 


Show that the requirement that the best linear approximation to e“f (x) be a 
constant, in the sense of (7.7.17), determines « in the form 


a = 10 16) dx 
So xf(x) dx 


[ἢ particular, show that the most appropriate choice of « for Prob. 23 (in this sense) 
is (N + 2)/N. 


Section 7.8 


25 


26 


(Obtain from (7.8.6) the relations 
d 
— A(x) = 2rH,_1(x) 
dx 


d Ϊ . 4 ἊΣ 
PE [ Ξ rz 9) = —2re~H(x) 


and deduce also that H,(x) satisfies the differential equation 

Hy’ — 2xH} + 2rH, = 0 
Use the first relation of Prob. 25, with the relation Hy(x) = 1, to show that the 
coefficient of x” in H,(x) is 2". Also, by writing 


cO 
H,(x) = > a,x"~?* 
K=0 


io the differential equation of Prob. 25, show that 


_ (r = 2k) — 2k — 1) 
4(k + 1) 


— 


ἀκ. 1 k 


and deduce that 
rir — 1)(r — 2)(r — 3) 
2! 


where the series terminates with a multiple of x when r is odd and with a constant 
when r is even. 


H(x) = (2x)" - we (2x)? + (Qxyr4 -... 
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27 


28 


29 


30 


Show that (7.8.13) can be written in the form 


α 


rina 


00 
Ϊ eB x) dx 
— 00 


if f(x) exists and is continuous for all x, and if f(x) and its first r derivatives are 
dominated by x~2e**” as x + +0. 
By taking f(x) = e?°* and α = 1 in the result of Prob. 27, obtain the expansion 


te 8) ck 
e2ex— ο2 - οἶδ H,(x) 
k! 
k=0 


If f(x) = 1 — |x| when |x| S 1 andf(x) = 0 when |x| 2 1, obtain the expansion 


3 — a 


oe ᾿ ε΄ «᾽ν: | Hote Β Η;(αχ) 


7 
45 — 3007 + 4a* 


ἜΤ Hi(ax) -ἰ | 


and show that, if α is chosen such that the coefficient of H, vanishes, there follows 


— 


70) = iP ε΄ 355 [H(V/3 x) -- thoH(V3 x) + °°] 


If the origin is chosen such that m, = 0, and if αὖ is then taken to be mo/(2mz2), 
with the notation of (7.8.17), show that the expansion (7.8.14) becomes 


2 —— 
G0) = Se 7" | m, + MO τ σπιὺ 371) Fr (ax) 
Vn 12 

τ (4a*m, — 12a*mz + 3Π10) 


H. bee 
96 (ax) + | 


Section 7.9 


31 
32 


33 


Verify that (7.9.8) satisfies the recurrence formula (7 9.11). 
Obtain the expansion 


| 1 
il = $ [arse + τσοὺ — τοτὲ CDM κι 


=a Ty, (x) + Ἢ 


when |x| < 1, and compare the least-squares approximations of degrees 2 and 4 
with those obtained in Prob. 20. 
Show that the function S,(x) defined by the relation 


1 
S (x) = Ἢ 71..1(Χ) 


r+ 


34 
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is a polynomial of degree r, expressible in the form 


S.(x) = ce sin [(r + 1) cos~! x] = ee (0 = cos~! x) 
ΜΊ1 — x? sin 0 
and that the polynomials So(x), S,(x),..., S,(x),... are orthogonal over 


[-1, 1] relative to the weighting function w(x) = V1 — x2. Show also that 
S-41(x) = 2rS(x) — S,_1(x) (r = 1) 
that A, = 2’ and y, = 7/2, and that the coefficients in the least-squares approx- 
imation 
70) καὶ Wa) = Σ, aS (x) (|x| Ξ 1) 
are given by 
2 1 a 
a. = 1 V1 — x? f(x)S,(x) dx 
q J-1 
when the requirement 
1 
[ ΝΊ -- x? (f(x) -- γα) ]2 dx = min 
-1 


is imposed. 
Using the notation of Prob. 33, obtain the expansion 


1 


ae Ce ae er) a eee 
Ix = [48000 + $562) — 0 + (DH τ ττσε τ τ 


So.(x) + +> | 


and compare the least-squares approximations of degrees 2 and 4 with those 
considered in Prob. 32. 


Section 7.10 


35 


36 
37 


If B, is the coefficient of x'~! in ¢,(x), show that the coefficient b, in the recurrence 
formula (7.10.3) is given by 
a (2: : τὴ 


K+1 Ay 
so that (7.10.3) can be written in the form 


A B B Aus yAp-1 7 
¢n41(x) = ἘΞ: |» + (a = ra) P(x) — i het ἐκ $y —1(X) 
A, Ayr, Ax A; Ve-1 
Use the result of Prob. 35 to derive the relations (7.6.9), (7.7.12), and (7.8.10). 
If y,(x) is the least-squares polynomial approximation of degree n to f(x) over 
[z, 8], relative to the weighting function w(x), and if ¢(x) is the rth relevant 
orthogonal polynomial, use the Christoffel-Darboux identity to show that 


b 
Yn(X) -- πὴ won [bn+1(x)O,(t) ἘΣ φ,(χ)ῴ, 1 ({)] 1 


ap Vn a 
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38 Prove that the zeros of $,,(x) separate the zeros of ¢m4.4(x) by the following steps 
(or otherwise): 

(a) Use (7.10.11) to deduce that if x; and x;,, are successive zeros of ¢,,4 (x), 
then ¢,,(x;)¢),4 10%) and @y(Xi+1) $+ 1(%i41) have the same sign. 

(b) From the established fact that the zeros of ¢,,,4(x) are simple, deduce that 
$+ 1(%;) and $7, , «(αι 1) are of opposite sign and hence complete the required 
proof. 

39 Use the formulas (7.10.3) and (7.10.13), with w(x) = 1 and [a, δ] = [--ἰ, 1], 
to generate the first five of the polynomials P,(x) = P,(x)/A,, where P,(x) is the 
kth Legendre polynomial and A, its leading coefficient; compare the results with 
(7.6.8). 

40 Show that the monic polynomials b,(x) which are orthogonal over [a, ὁ] relative 
to w(x) can be generated recursively by use of the Gram-Schmidt formula 

k-1 


d,{x) = x* τι: 2. Chr y(X) 
where 


[ὃ wxkd, dx 
fa wy dx 
starting with $o(x) = 1. (First take k = 1; then use an inductive argument to 
derive the formula.) Also use this method in place of that used in Prob. 39 to 
determine P,(x)/A, (Κ = 0, 1,..., 5) and compare the amounts of labor involved 
in the two processes. 
41 Use Clenshaw’s method to evaluate the sum 


κε 


to five places when x = 0.10324. 


Section 7.11 


42 Express the following sums in closed forms: 
n+1 


(α) 1.2 Ὁ 2.3 Ἐ.::--πὺῦι ἘΞ > ™ 
s=2 


(δὴ) 1-2-3 4+ 2-3-4 4---+ nv + 1) + 2) 
(c) 1-24+4-54+ 7-8 +--+ + Gn — 2)Bn — 1) 


43 Express the following sums in closed forms, and determine the limit of each as 


n—- ©: 
n—1 
ie ee υ 1 = (- 2) 
1-2 2.3 n(n+ [1 s=0 
1 
(δ) | ἊΝ ee 
1-2-3 2-3-4 n(n + 1)(n + 2) 
1 3 5 2n — 1 
Cc Se, Β + ees a τοὺ’ τ 
(Ω τ 2.3 2.3.4 3-4-5 n(n + τ) + 2) 


44 


45 


46 


47 


48 


49 


50 


iM 
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Show that 
(—m)™ = (—1)"m + n — 1) 


5 Dak ad | FP 


when ἢ is a positive integer. 
Show that 


Use the binomial expansion of (1 + x)" to deduce the following relations: 


ὌΝ 
© > (i) 2 


ὦ) SI) (7) = 0 


k=0 
Show that 
= (n\? _ [2n 
> k n 
k=0 
[Use (7.11.13) and (7.11.12). ] 
If m is a positive integer, show that 


and hence that 


and deduce the relation 


‘Show that 


N 
Σ s Aju, - [5 Au, ἘΠ Waly 
S=M 


and use this formula to obtain the result 
x 1 
>, σα = ——— [ΝαΝῚΣ- (N+ 1)αΝ 11 4+ 4] (a 1) 
5Ξ0 (α -- 1)? 


by taking u, = a‘/(a — 1)? or otherwise. 
Show that 


Us Aivs = [᾿ς ΔΙ ὗν, = (A, u,)(AT 7054 1) πῷ (Aju,)(A, 30,42) 


N 
So (1) (7, a αὶ (--1}} Σ᾽ (Aj μρ)υς εν 
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and also that 


N 
> us Δίυς = [us—1 ΔΊ σῦς — (Arus—2(AT 705) + (Δῆμ. Δ)(ΔῈ 00) 


s= 


N 
Hae (HIATT Dee! + (HDD (Δἴα, 


Section 7.12 
51 Show that (7.12.13) can be written in the form 
N 
y(N) = (---Ἠ 1» AAN) >, Us + 7, N) 
s=0 


where A,(N) is the coefficient of s” in ¢,(s, N). 
52 Show that the minimum value of the left-hand member of (7.12.11) can be 


expressed in the form 


N n 
A,AN) = 2 w(s)[F(a + sh)? — 2, γ(Ν)αΐ 


Section 7.13 
53 With the notation of Prob. 51, show that, when (7.13.9) is imposed on (7.13.6), 


there follows 


ἘΣ ς 2_ (-1)2)! ς (r) τ N— 1) 
HN) = > [dls NYP = ae Why + r)%s+r-N-1) 


By making use of appropriate summations by parts, show further that 


1 N 
7(N) = ae > (s + py” 
(NOY 2: 


and deduce the closed form 
1 (N+tr4+1%t? 


(AN) = 
ne 2. +1 No 


54 Use the results of Prob. 53 and of Eqs. (7.13.10) to express the leading terms of 
(7.13.12) in the form 
M 


1 ; 31 Σ 
we 2M + i 2. Ie) M(M + 1)2M + 1) —, na 


ii 1512 — 5M(M + 1) 
M(M + 1)2M — 1)2M + 1)2M + 3) 


x Σ [3.2 — M(M + 111) Ἐ τ.’ 
Κε-Μ 
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Section 7.14 


55 


56 


57 


58 


Prepare a table analogous to Table 7.1 in the case M = 3, when seven points 
are employed in the least-squares approximation, including only the orthogonal 
polynomials of degrees 5 and less. 

By using the table prepared in Prob. 55, obtain least-squares polynomial approx- 
imations of degrees 1 to 5 to f(t) = F@ + 4t) for |r| S 3 from the following 
approximate data, calculate the respective smoothed values at the tabular points, 
and determine which approximation is probably most appropriate if the data are 
empirical, with errors having an estimated RMS value of about 0.07: 


x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 


15.564 18.059 20.548 23.554 26.348 29.498 32.830 


F(x) 


Use the results of Prob. 56 to obtain approximate values of the following quantities 
from the smoothed data: 


Ε(01)  F8) ΕΟ  F(1.3) [ Ἵ F(x) dx | ** F(x) dx 


0 1.2 


Obtain a formula analogous to (7.14.8) for fifth-degree seven-point least-squares 
approximation. 


Section 7.15 


59 


60 


Use (7.15.4) and (7.15.5) to obtain smoothed values of the data given in Prob. 56, 
corresponding to third-degree and fifth-degree least squares, and verify that the 
results agree with those obtained in Prob. 56. 

The following data represent estimated world route mileages of scheduled air 
services in the years given in units of 1000 miles. Calculate smoothed values, using 
doth first- and third-degree five-point formulas, and plot the two smoothed curves 
together with points representing the given data. 


1919 3.2 1926 48.5 1933 200.3 
1920 9.7 1927 54.7 1934 223.1 
1921 12.4 1928 90.7 1935 278.2 
1922 16.0 1929 125.8 1936 305.2 
1923 16.1 1930 156.8 1937 333.5 
1924 20.3 1931 185.1 1938 349.1 
1925 34.0 1932 190.2 


Also use the two sets of smoothed data to obtain estimates of the annual rate of 
increase of mileage, at the end of the tabulation, to be used for long- and short- 
range predictions. 
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Section 7.16 


61 Prove that if w(s) = 0 for s = 0, 1,..., N, then the polynomial ¢,(s) possesses r 
real zeros in the interval 0 < 5 < N. 

62 Prove that the zeros of ¢,(s) separate the zeros of ¢,4,(s). (See Prob. 38.) 

63 Use the recursive method of Sec. 7.16 to generate the polynomials @,(S) 
(r = 0, 1, 2, 3, 4) when w(s) = 1 and the set Sy comprises the five points s = 
—2, —1, 0, 1, 2, and check the results by reference to (7.14.1). 

64 Modify the Gram-Schmidt formula of Prob. 40 to apply to the discrete case. 
Also use it recursively in Prob. 63 in place of the suggested method and compare 
the two methods with respect to amounts of labor involved. 


ὃ 


GAUSSIAN QUADRATURE AND 
RELATED TOPICS 


8.1 Introduction 


The formulas given in Chaps. 3 and 5, for the purpose of numerical integration 
(with or without differences), each involve sets of ordinates which correspond 
to equally spaced abscissas. As might be expected, corresponding formulas 
which are generally capable of supplying comparable accuracy with fewer (about 
half as many) ordinates can be obtained by determining the optimal distribution 
of the abscissas, rather than prescribing them in an arbitrary way. It is found 
that the abscissas so determined are generally specified by irrational numbers 
and that the same is usually true of the weights by which the corresponding 
ordinates are to be multiplied. 

As a specific, but typical, example, which may be helpful in motivating 
some remarks with regard to such formulas, the five-point Newton-Cotes 
formula (3.5.13), of closed type, is of the form 


[fe ae = ἀπο + 320-91 1200) 


+ 3219) + 7/(1)] - 2 (8.1.1) 
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when related to the interval [—1, 1], whereas the Legendre-Gauss three-point 
formula, to be derived in Sec. 8.5, is of the form 


[. Ot re 3|s7(- ®) + 8/(0) + (2) + 4£°OQ) 1.2) 


15750 


where both ¢, and €, lie somewhere in [—1, 1]. 

A comparison of the two error terms shows that the second formula, 
which requires the values of only three ordinates, generally may be expected 
to afford about the same accuracy as the first, which requires five ordinates, 
when the error terms are neglected. Also, since the weights are positive in both 
formulas, the error in the result, due to possible errors in the ordinates, cannot 
exceed (but can equal) twice the maximum of those errors in both cases. More- 
over, if random errors in the ordinates are considered, the corresponding RMS 
errors in the approximations afforded by the first and second formulas are found 
to be given by about 0.48 and about 0.68 times the RMS ordinate error, 
respectively. 

Thus the apparent advantage of the second formula consists of the fact 
that, aside from the central ordinate, which is needed in both, it involves only 
half as many ordinates as the first. However, unless f(x) is a polynomial (in 
which case the formulas are not needed) the required ordinates are generally to 
be obtained by reference to a table of values of f(x). It is then sometimes 
argued that, since two of the abscissas in (8.1.2) are irrational, interpolation 
involving at least two tabulated ordinates will be required for the determination 
of each of the two off-center ordinates, so that at least five ordinates truly will 
be involved in the use of (8.1.2). Thus, the apparent advantage is lost, and even 
reversed, since (8.1.1) involves the five ordinates needed in a simple and specific 
form. 

For this reason, and also because of the fact that the weights in most 
gaussian formulas are also irrational (the present case is an exception), so that, 
in place of multiplying each ordinate by an integer, one must multiply it by a 
number with at least as many significant digits as are required in the final result, 
relatively little practical use was made of such formulas until recent years. 

This situation was indeed unfortunate, since the second reason given, 
while an important one when calculations are necessarily effected by hand, 
slide rule, or use of tables of logarithms, is clearly of no significance when a 
computing device with even the relatively limited efficiency of a desk calculator 
is available, and the argument supplying the first reason is (rather obviously) 
generally fallacious. Specifically, it assumes that the ordinates denoted as 
f(—1), f(—4), and so forth, are known or can be found directly in tables, 
without the need of interpolation, and is valid only then. 
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It is true that available tables of many functions, such as e~*’ and Jo(x), 
for example, include these arguments, and these are typical of the functions 
which most frequently appear in textbooks. But practical problems tend to 
deal instead with functions such as οὖ and Jo(ax/L) over the interval 
[—ZL, L] and, correspondingly, with e~*’ and Jo(ax) over the normalized 
interval [ —1, 1], where « is a function of certain physical quantities and is most 
unlikely to have an integral (or rational) value. Thus, in practical situations, 
it is probable that each of the ordinates appearing in either of the forms (8.1.1) 
and (8.1.2) will have to be determined by interpolation (or by direct calculation), 


and the interpolation for To(av 15/5) would be more difficult than that for 
Jo(0;/2) only in that the determination of the numerical argument of the inter- 
polate in the former case would involve a multiplication of two n-digit numbers. 
The necessary accuracy of the interpolation or calculation would be no higher 
in one case than in the other. 

Further, it may be noted that, when use is made of a large-scale digital 
computer, and when f(x) is defined analytically, approximate values of the 
integrand often are not obtained by interpolation in tables in any case, but are 
generated directly within the computer by subroutines incorporated in the 
program. Here, since the machine does not distinguish between rational and 


irrational arguments, the approximate evaluation of ft (/ 15,5) is in no way 
more complicated than that of f(4). Thus, formulas such as (8.1.2) are indeed 
advantageous when the determination of ordinates needed for the conventional 
formulas would involve either direct calculation, physical measurement, or 
interpolation and when the use of a high-precision formula is appropriate. ἢ 

The developments of this chapter relate these formulas to a method of 
osculating interpolation, associated with the name of Hermite, which is treated 
in Sec. 8.2, and to an associated quadrature formula (Sec. 8.3). Several of the 
classical quadrature formulas of the gaussian type, in which no abscissas are 
arbitrarily preassigned, are considered, together with their error terms, in the 
subsequent sections, which depend upon certain results from Chap. 7. Section 
8.10 deals with the modifications necessary when certain of the abscissas are 
preassigned, and the results are illustrated in the next two sections. The con- 
vergence of sequences of approximations generated by the formulas considered, 
as the number of ordinates used is increased, is considered in Sec. 8.13. Section 
8.14 deals with a special class of quadrature formulas in which the weights, 
rather than the abscissas, are preassigned; and Sec. 8.15 deals with algebraic 


+ Perhaps because of the fact that they are particularly useful when the integrand is 
defined analytically, they are usually called quadrature formulas, whereas formulas of 
the usual type are usually called integration formulas. There is no basic distinction 
between the terms. 
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methods for deriving quadrature formulas of the type considered in this chapter, 
without making use of the properties of orthogonal polynomials. The concluding 
section illustrates these procedures by obtaining a pair of simple formulas which 
approximate finite Fourier-transform integrals. 


8.2 Hermite Interpolation 


The interpolation formulas so far considered make use only of a certain number 
of values (approximate or exact) of the function to be approximated. Except 
in the case of the least-squares formulas of the preceding chapter, the approximat- 
ing polynomial y(x) has been defined as that polynomial of lowest degree which 
agrees with the approximated function f(x) on a certain discrete set of points. 
In certain cases, values of both f(x) and its derivative f’(x) are available, say, 
at m points.t We next derive an interpolation formula which utilizes these 2m 
data and, in the remainder of this chapter, show that the result leads to useful 
formulas for numerical integration which do not depend upon knowledge of 
values of f’(x). 

Before proceeding to these matters, however, it is desirable to review the 
lagrangian interpolation formula treated in Chap. 3 and to write it in a slightly 
modified form. If the values of f(x) are known at the m points x = x1, X2,.--, 
Xm, the auxiliary functions 


m(x) = (x — χι)ὰα — χῃρ)Ρὸὲε.-“(α — Xm) (8.2.1) 
and 


= π(χ) 
οι (x -- x;)n'(x;) 


=, (ee = Xji-1)(x — χα) (α - Xm) 


7 (x; — χα -- χλαι τ χὰ = he) 
(i = 1,2,..., m) (8.2.2) 


are first defined, with the properties 
m(x;) = 0 (8.2.3) 


+ In the preceding chapters, integration formulas were obtained by integrating the 
interpolation polynomial of degree n which agrees with the integrand at π + 1 
points, so that the principal emphasis was on the degree n of that interpolation 
polynomial, and it was convenient to number the relevant + 1 abscissas from 0 
to n. On the other hand, the derivations of the integration formulas which are to be 
treated in the present chapter are based on certain properties of the polynomial 
whose zeros are the abscissas of the points involved in the integration formula. It is 
thus more convenient to use a new symbol m to represent the degree of that poly- 
nomial, and hence also to represent the number of ordinates employed, and to number 
the ordinates from 1 to m. 
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and 
1(x;) = δι; (8.2.4) 


where 0;; is the Kronecker delta (zero when i 4 j and unity when i = /). 
With these notations, the polynomial of degree m — 1 which takes on the values 
I (x.), f(%2), ..., and f(x,,) is expressible in the form 


yO) = SLOVO) (825) 


Also, if f(x) is continuous in the interval J limited by the largest and smallest 
of the m + 1 numbers x,, x2,..., X,,, and x, the error 


E(x) = f(x) — yo) 


is expressible in the forms 


E(x) = n(x)f[x1,..-3 Xm X] = Ξ [0 n(x) (8.2.6) 


where € is somewhere in J. 

Now suppose that values of both f(x) and f'(x) are known for x,,..., χη. 
Since a polynomial of degree 2m — 1 is specified by 2m parameters, it is 
plausible that one such polynomial y(x) can be determined in such a way that 
y(x) and f(x) possess the same value and the same derivative at each of these m 
points. We next attempt to determine such a polynomial by assuming that it is 
expressible in the form 


y(x) = τῶ nfo) + Σ KS") (827) 


where A(x) and hx) (i = .., mM) are polynomials of maximum degree 
2m -— 1, to be ae 
The requirement that y(x;) = f(x,) clearly will be satisfied if 


whereas the requirement y’(x,) = f’(x,) will be satisfied if 

for 1 SiS mand 1 <j Ξ πηι. Now, since /,{x) is a polynomial of degree 
m — 1 which satisfies (8.2.4), the function [J,(x)]? is a polynomial of degree 
2m -- 2 which satisfies (8.2.4) and whose derivative vanishes at x; when i τῇ j. 
Hence, since h(x) and h,x) are polynomials of degree 2m — 1, there must 
follow 


AW) = νι} )] hi) = “ΟΠ ]Σ (8.2.10) 


wher r(x) and s,(x) are linear functions of x, in order that the conditions of 
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(8.2.8) and (8.2.9) be satisfied when i 4 j. The other four conditions then give 


r(x;) = 1 ri(x;) + 21j(x) = 90 (8.2.11) 
and 
s{x;) = 0 si(x;)) = 1 (8.2.12) 


from which there follows 
r(x) = 1 — 21;(x;(x — x;) s(x) = x — x; (8.2.13) 


Hence, by combining (8.2.7), (8.2.10), and (8.2.13), we obtain the desired 
polynomial in the form 


yo) = FS odfen) ἘΣ HCC) (824) 
where 


h(x) = [1 -- 24x) — x) TA(x)]? (8.2.15) 
h(x) = " xo) (8.2.16) 


This result is known as Hermite’s ΝΕ formula (Hermite [1878 [) or, 
frequently, as the formula for osculating interpolation. (For a more general 
formula, which also uses values of higher derivatives of f(x), see Fort [1948 ].) 

An expression for the error E(x) = f(x) — y(x) can be obtained by a 
method similar to that used in Sec. 2.6. Thus we notice that both E(x) and 
[x(x)]* vanish together with their first derivatives at each of the m points x = 
X1,+++5Xm- We then form a linear combination of these functions 


F(x) = f(x) — γο) -- Κ[ποῦ] 8.2.17) 


which therefore has the same properties, and determine K in such a way that 
F(x) also vanishes at an arbitrarily chosen additional point x = x. 

Now let I represent the closed interval limited by the smallest and largest 
of the numbers x1, Χ2».-..» Xm, and x. Since F(x) vanishes at these m + I 
distinct points, F’(x) must vanish at at least m intermediate points inside I. 
But since F'’(x) also vanishes at the m points x1, ..., Xm, it vanishes at least 2m 
times in J. Thus F”(x) vanishes at least 2m — 1 dimes gy I, F’"(x) at least 
2m — 2 times,..., and hence, finally, F°”(x) vanishes at least once inside I, 
assuming the existence of the derivatives considered. Let one such point be &. 
Then, recalling that y(x) is a polynomial of degree 2m — I, and hence that 
y?™ (x) = 0, we obtain from (8.2.17) the result 


0 = FEE) = fC) — Km)! 


and 


or 


_ £2 
(2m)! 
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Thus, since F(x) = 0, there follows 
εἰς, κα ae αἰ ον 
E(x) = f(%) -- ye) =——= [a(X)] 
(2m)! 


where & is somewhere in J. Since both sides of this relation vanish when x 15 
identified with one of the points x;, the relation is true also for such values of Χ 
and hence for any x. Hence, suppressing the bars, we deduce that the error 
associated with approximating f(x) by the right-hand member of (8.2.14) is 
of the form 

Ex) - O tape 8.2.18) 

(2m)! 

where ¢ is somewhere in the interval J. 

It can be seen that the polynomial (8.2.14) could also be obtained as the 
limit of the polynomial of degree 2m — 1 which agrees with f(x) at 2m distinct 
POIN'S X1, X4, Xz, X2,..., Χῃρ Xm» aS each x; tends to x,. From this fact we can 
conclude that the error E(x) also can be expressed in the form 


E(x) = [nx)]°f [x1, 1, ---5 Xm» Xm» X] (8.2.19) 


assurning only that /’(x) exists in J. 

From (8.2.18) it follows that the Hermite m-point formula yields exact 
results when f(x) is identified with any polynomial of degree not exceeding 
2m -- 1. In particular, we may deduce easily that the interpolation polynomial 
(8.2.14) is the only one having the specified properties. 


8.3 Hermite Quadrature 
From the Hermite interpolation formula we may deduce the formula 
b m m 
[wove d= 3 Ἡμὸρ ἘΣ mre +e 63.) 


with the weighting coefficients defined by the equations 


η, = [νοῦ dx = [΄ "πὶ -- 200 -- ΧΙ ὴ dx 6.3.2 


4 


and 
b b 
H, = I w(x)h(x) dx = i w(x)(x — x)[1(x)]? dx (8.3.3) 


and with the error expressible in the form 


τ al fE™(Q)w(x)[ n(x) 7? dx (8.3.4) 
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where a < ἔ < ὁ if the points x,, x2,..-, Xm lie in that interval. The result 
of neglecting the error term is called the Hermite quadrature formula. 
If the weighting function w(x) is nonnegative in [a, δ] 


w(x) =O 4(8.3.5) 


as will be assumed throughout this chapter, the coefficient of f 2m) in the 
integrand of (8.3.4) is nonnegative. Hence the second law of the mean may be 
invoked to permit (8.3.4) to be written in the more convenient form 


er as (3 i 2 
E= Cm)! | w(x)[ (x) ]° dx (8.3.6) 


a 


These results may be compared with the result which corresponds to 
lagrangian interpolation employing m points, which can be expressed in the form 


[ ᾿νΟ) ΟΣ dx = Σ sey) ἘῈ 83.7) 


where 


W, = { w(x)l(x) dx (8.3.8) 


and 
ἘΞ - | f™ (e)w(x)n(x) dx (8.3.9) 


Since 2(x) changes sign at each of the points x,,..., X,,, the law of the mean 
cannot be applied directly to (8.3.9) to produce a form analogous to (8.3.6). 

If a quadrature formula yields exact results when f (x) is an arbitrary 
polynomial of degree r or less, but fails to give exact results for at least one 
polynomial of degree r + 1, it is said to possess a degree of precision equal to r 
(see Sec. 5.10). From the linearity of the process, it follows that this situation 
exists if and only if exact results are afforded for 1, x, x7,..., x", but not for 
τὸς 

From (8.3.6) we see that the degree of precision of the Hermite m-point 
formula is exactly 2m — 1. It follows also from (8.3.9) that the degree of 
precision of the lagrangian quadrature formula, based on m points, is at least 
μι — 1. Furthermore, if we take f(x) = [x(x)]?, we see that all terms in the sum 
involved in (8.3.7) vanish, and hence, for this function, the lagrangian formula 
would give 


| w(x)[n(x)]? dx = 0 


Under the assumption (8.3.5), this situation is impossible. Hence, since [x(x) |? 
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is a polynomial of degree 2m, it follows that the degree of precision of the 
lagrangian m-point formula cannot exceed 2m — 1. Unless further information 
concerning the choice of the points x,,..., x,, is available, no more specific 
statement can be made. However, it is shown in the following section that 
there exists a class of formulas of the simple lagrangian type (8.3.7) which 
actually have the maximum degree of precision 2m — 1. 


8.4 Gaussian Quadrature 


An inspection of (8.3.1) shows that, if the points x,,..., x, can be chosen in 
such a way that the weighting coefficients H, associated with the derivative 
terms vanish, then the Hermite formula will reduce to a formula of the simple 
type (8.3.7) while retaining the degree of precision 2m — 1. With the notation 
of (8.2.1) and (8.2.2), the definition (8.3.3) can be expressed in the equivalent 
form 
24 1 ᾿ 
A; = ——]| w(x)n(x)I{x) dx (8.4.1) 
π(χὴ a 
where, as before, 
TX) = (X — χι)ὰα — χ»Κ)ῃὃ “ὦ — Xm) (8.4.2) 


so that x,,..., X,, are the m zeros of x(x). 

Thus ἢ; will vanish for 1 < i < μι, and the degree of precision 2m — 1 
will be preserved, if x(x) is orthogonal to 1,(x),..., 1,(x) over [a, b], relative 
to the weighting function w(x). Since each /,(x) is a polynomial of degree m — 1, 
by virtue of (8.2.2), a sufficient condition is that x(x) be orthogonal to all 
polyriomials of degree inferior to m over [a, b], relative to w(x). 

This condition is also necessary. To see this, assume that 


Η,- ὁ (isSi<m) (8.43) 


and that the formula has a degree of precision 2m — 1. Let f(x) bea polynomial, 
of degree 2m — 1 or less, expressed in the special form 


F(X) = W(X) ὦ (8.4.4) 


where u,,_ (x) is an arbitrary polynomial of degree m — 1 or less. Then, 
since m(x;) = 0 for 1 « i S m, there follows f(x,) = 0, and hence, for this 
polynomial, (8.3.1) becomes 


Ϊ Ἵ w(x) f(x) dx = [ w(x)72(x)u,—1(x) dx = 0 (8.4.5) 


as was to be shown, since H; = 0 by assumption and E = 0 by virtue of the 
fact that here f?™(x) = 0. 
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Hence we deduce that if and only if the polynomial n(x), of degree m, is 
orthogonal to all polynomials of inferior degree over {a, δ], relative to w(x), 
the Hermite quadrature formula reduces to the formula 


[ w(x) f(x) dx = > H,f(x,) + £ (8.4.6) 


a 


where 


E = a(S w(x)[n(x)]* dx (8.4.7) 
(2m)! Ja 
and where the m abscissas x,,..., Xm are the zeros of 7x). 

A formula of this type is usually called a gaussian quadrature formula, 
although it appears that only the case in which w(x) = 1 was explicitly con- 
sidered by Gauss, the generalization to other weighting functions being due to 
Christoffel [1858]. The weights H, accordingly are sometimes called the 
Christoffel numbers of the formula. 

Since (8.4.6) is a special case of both (8.3.1) and (8.3.7), the weighting 
coefficients H, and W, given by (8.3.2) and (8.3.8) must be equal in this case. 
(See also Prob. 7.) Thus we may write 


H; = [ w(x)[1(x)]? dx = [ w(x)l(x) dx = W, (8.4.8) 


a 


the first form being obtained from (8.3.2) by writing that formula in the form 


b 
H, = Ϊ w(x)[1(x)]? dx — 21j(x)A; 
and recalling that here H; = 0. 

The polynomial x(x) is precisely that numerical multiple of the polynomial 
¢,,(X), specified by Eqs. (7.5.4) to (7.5.7), for which the coefficient of the leading 
power of x is unity. Thus, as was shown in Sec. 7.10, its m zeros are indeed real 
and distinct and are all located in the interval (a, b). The interval need not be of 
finite length, as long as w(x) = 0 and the integral [ἢ x“w(x) dx exists for all 
nonnegative integral values of k. It is of particular importance to notice that, 
by virtue of the first form of (8.4.8), the weighting coefficients in a gaussian 
quadrature formula are all positive. 

With the notation of Sec. 7.4, the error (8.4.7) can be expressed in the form 
Ym LMG) (5.4.9) 


Az (2m)! 


where γ,. is the normalizing factor corresponding to ¢,,(x) and is defined by 
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(7.5.13), and where A,, is the coefficient of x” in ¢,,(x).+ If use is made of (8.2.19), 
instead of (8.2.18), in (8.3.4), the error also can be expressed in the form 


E = ΡΝ X1,.025 χη Xm €] (8.4.10) 


In order to determine explicitly the weights H, defined by (8.4.8), we make 
use of the Christoffel-Darboux identity 


Σ POOPY) _ Pm 12)Om(Y) = φ,,(Ο)φ,....() 


k=0 Vk Am mk X oF y) coe 


with a,, = An+1/Am, established in Sec. 7.10. If we notice that here 


φ,,(Χ) = A(x) (8.4.12) 


and identify y with x,, where x; is a zero of x(x), so that P»(X;) = 0, Eq. (8.4.11) 
specializes to the form 


Om+ 1(X;) P(X) = >) P(X), (X;) (8.4.13) 
k=0 Vk 


ἌΝ ΝΕ X — X; 


The result of multiplying the equal members of (8.4.13) by w(x)¢o(x), integrat- 
ing tne results over [a, b], and making use of the orthogonality of the poly- 
nomials, relative to w(x), is then 


Pn + 1(%;) [ w(x) φρο(χ)φ, (χ) dx ἘΞ — bo(x;) 
An) m a X — X; 
or, since φρ(χ) is a constant, 


: P(X) 7 mm 
[ w(x) Ean ax ἀπ (8.4.14) 


Finally, since 


ix) = 7)  φἀ gg 5) 
m'(x;)(x — x;) P(X (x — Xj) 


reference to the second form of (8.4.8) leads to the desired result 


je w(x) Pm) gy = — Amt 17m (8.4.16) 
PrlXi) Jag Xx — Xj An Pl Xi) Pm +1(X;) 


+ Whereas ¢,,(x) could always be so defined that either Ym OF A,, iS unity, this choice 
usually does not lead to a standard (tabulated) form. Hence, the formulas are given 
without such a restriction. 
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If use is made of the recurrence formula (7.10.3) in the form 


Pm 10) = (AnX + Dy \Pm(X) — em — bn 1(X) 


m—-1)m-1 
there follows 
Am An- m 
φιειαὴ = — Setieeet te ln g(x) 
m m-1 


so that H; also can be expressed in the equivalent form 


H, = ——4mm-1__ (8.4.17) 
Aig 1Pin(Xi)Pm— 1(%;) 
In this connection, it may be noted that the result of setting x = x; in 
the confluent form (7.10.11) of the Christoffel-Darboux identity, where x; is a 
_ zero of ¢@,,(x), is the curious relationship 


S [bed]?  Α,φι(χὴφ, «(αὐ 


+ (8.4.18) 
k=0 Vk An+ 1?m Hi; 


8.5 Legendre-Gauss Quadrature 


In the case when a constant weighting function is to be used over a finite interval, 
it is convenient to suppose that a suitable change of variables has transformed 
that interval into the interval [ —1, 1]. From the results of Sec. 7.6, we then have 


mx) = P.,{x) (8.5.1) 


where P,,(x) is the mth Legendre polynomial, and where 
(2m)! 


a 8.5.2 
2™(m 1)? ( ) 
With the additional result 
2 
a 8.5.3 
2m + 1 ( ) 


Equations (8.4.6), (8.4.16) or (8.4.17), and (8.4.9) reduce to the quadrature 
formula 


"Ξ ae > Hyf(%) +E (8.5.4) 
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where x; is the ith zero of P,,(x), and where 


a ne eT 


(m a 1)Pin+ 1(x)P m(X;) mP,,- 1(x)P. m(X;) 


22 π|Ὲ 1(m 1)* 


~ Om + Dimi) FO™%E) (8.5.6) 


From the known relation} 


(1 — x?)P,,(x) 


—mxP, (x) + mP,,— (x) 
(m + 1)xP,(x) — (m + 1)Ρ,,...(χ) (8.5.7) 


Ι 


there follows also 
(1 — ΧΡ, (χρ = mPy— (x) = —(m + ΠΡ... αὐ 
50 that (8.5.5) can also be expressed in the forms 


2 2( — x?) 
NO Saye ea eae 
C-DP@r @+)Pawe &% 


as well as in a variety of still other forms. 
In illustration, when m = 3, there follows 


mx) = $P3(x) = x(x” — 3) 


and hence 
and 
Η, = ὃ Ay = ὃ A; = 3 
Thus 
v15 ν 150], f"® 
[ Ata) dx -ς 5,7" a he 8 f(0) + WS I+ 15750 (8.5.9) 


where |¢] < 1, as given in (8.1.2). When m = 1, a formula equivalent to the 
midpoint rule (3.5.18) results. The abscissas and weights corresponding to 
formulas for which 2 < m < 5 are listed (to six digits) in Table 8.1. More 
elaborate tabulations are listed in the references (see Sec. 8. 17). 


} For derivations of (8.5.7), and of other similar differential recurrence formulas listed 
in this chapter, see Szego [1967]. 
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Table 8.1 

m Abscissas Weights 
+0.577350 1 

3 0 8 
-ῷο.774597 ὃ 

4 - 0.339981 0.652145 
Ἔ0.861136 0.347855 

5 0 0.568889 
+ Ο.538469 0.478629 
+0.906180 0.236927 


8.6 Laguerre-Gauss Quadrature 


In the case when the weighting function 
w(x) =e * (8.6.1) 


is used over the semi-infinite interval [0, 00), the results of Sec. 7.7 give 


Hey -- L,(x) (8.62 


m 


where L,,(x) is the mth Laguerre polynomial and where 


Am = (-1)" (8.6.3) 
In addition, there follows 
Ym = (m!)? (8.6.4) 


and hence the formulas of Sec. 8.4 become 


{ ” e*f(x) dx = Σ Hyfly) +E (8.6.5) 
0 k=1 


where x; is the ith zero of L,,(x),f and where 


H; = en (2) ae Tm = ὉΠ’ (8.6.6) 
Li Xj) ΒΡ 1(;) Lil Xj) Ln — 1(%i) 
and 
ἮΝ (m!)" κ(πὴ 
Ε (Om)! SENG) (8.6.7) 


where 0 < ἕ < oo. From the relation 
xLi(x) = mLy(x) — M7 Lm 1(%) = (x — m — 1)},.(Χ) + Dns 10) (8.6.8) 
there follows also 

XL, (Xi) = — mM? Lin — (Xj) = Ln 1%) 


+ The more general case in which e~* is replaced by e~ δ" in (8.6.5) is clearly reduced 
to the present case by a simple change of variables. 
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so that (8.6.6) can also be expressed in the forms 


1)2 1)?x. 
es RG 
χ L(x) | [Ln+1(%;)] 
In illustration, when m = 2, there follows 
n(x) = L(x) = x? — 4x +2 
and hence _ 7 
x, -- 2 -- ν2 χ; =2+ 72 
and 2 7 
2... 12 De 2} 
| 5 Oe les 7 = 
4 4 
Thus 
[“Ὑὑὼ ax = κα + νῆα - V2) 
POA D/O Ts ce (8.6.10) 
wher2 0 < € < oo or, more generally, 
| e f(x) dx = — ac Ἔ ΝΣ 25} 
0 4a 
ry eas + ::.3}}. ΓΘ g.6.11) 
6α" 
The abscissas and weights corresponding to formulas for which 2 < m < 5 


are listed (to six digits) in Table 8.2. Other tabulations are listed in the references. 


Table 8.2 

m Abscissas Weights 
0.585786 0.853553 
3.414214 0.146447 

3 0.415775 0.711093 
2.294280 0.278518 
6.289945 0.0103893 

4 0.322548 0.603154 
1.745761 0.357419 
4.536620 0.0388879 
9.395071 0.000539295 

5 0.263560 0.521756 
1.413403 0.398667 
3.596426 0.0759424 
7.085810 0.00361176 

12.640801 0.0000233700 
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It may be noted that the use of Laguerre-Gauss quadrature is by no 
means restricted to cases in which the integrand is explicitly expressed as a 
product of the form e~“f(x). Its use may be appropriate in approximating 
an integral 


[: F(x) dx 


in which F(x) can be approximated by a polynomial of moderate degree for 
small and moderately large positive x and is known to tend to zero as x > 0 
like e~** times a polynomial, where « is a known positive constant. In this case, 
one would identify f(x) with e**F(x) in using (8.6.5). As another example, we 
note that 


{ eV +@ Fx) dx = Ϊ ο΄. *f(x) dx 
— 00 0 
with the definition 

f(x) = 3 CPF? το LF) + F(-X)] 


Similar comments apply to the other quadrature formulas to be considered. 
When the weighting function (8.6.1) is generalized to the function 


w(x) = χ᾽" (B> --) (8.6.12) 
it is easily found, by the methods of Sec. 7.7, that 


Mo - L(x) (8.613) 


m 


where L’(x) is the generalized Laguerre polynomial of degree m 
LP (x) = e*x7? as (e~*x?*™) (8.6.14) 
dx™ 
and that 


A,, = (-- 1)" Vn = m! | xbtme-* dx = m! T(m - β- 1) (8.6.15) 
0 


It can also be shown that the differential recurrence formula 


ΧΙ) = mLA(x) — m(m + B)Ln-1(%) 
=(x-m—B-1)L2+15., (8.6.16) 
is satisfied. 
From these results, the corresponding quadrature formula is derived 
in the form 


{ * he-*f(x) dx = > Hfly) + EB (8.6.17) 


0 
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where x; is the ith zero of L4(x), and where 


m!I(m+B+1)_ m!T(m + B + 1x; 


ὧδ χ Πα 7 ΓΙᾺ + 14% 1? 
and 
_ mi t(m + B+ 1) pam 
a a 


8.7 Hermite-Gauss Quadrature 


In the case when the weighting function 


= 


w(x) =e 


is used over the interval (— οὐ, 00), the results of Sec. 7.8 give 


n(x) = — Hy(X) 


where H,,(x) is the mth Hermite polynomial, and where 
A= 2 
In addition, there follows 
= Jz 2m! 


so that the appropriate gaussian formula is of the form 
fe @) x2 m 
[ e ~ f(x) dx = Pz H,f(x,) + E 


where x; is the ith zero of H,,(x), and where 


em ψπ ζῴη — DI ψπ 
H1(x,)H m+ 1(%)) m+ 1(X;) FAX) Am 1(X;) 


oes 


2™(2m)! 


and 
f"©) 
for some ἔ. From the relation 


H,Ax) = 2mH,,_ (x) = 2x (x) — H+ (x) 
there: follows also 
H,AXx;) = 2mH ,— (x) = — Hy + 1(%}) 


so that (8.7.6) can also be expressed in the forms 


mile πεῖ} ψπ 


᾿ [H mi Xi) ]” [Hin+ (x) ]? 


(8.6.18) 


(8.6.19) 


(8.7.1) 


(8.7.2) 


(8.7.3) 


(8.7.4) 


(8.7.5) 


(8.7.6) 


(8.7.7) 


(8.7.8) 


(8.7.9) 
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In illustration, when m = 3, there follows 


mx) = $H3(x) = x(x” — 3) 
and hence 7 ᾿ 
x, = "᾽ς χη = 0 x3 = : 
and Ε 5 
η, - H, -- 22 a, = 
Thus 
[΄ ε΄ “ΤΟ dx = ἢ (- ~) + 4f@) (ἢ + are 
(8.7.10) 
or, more generally, 
[ ἐπ Ἰ dx = of "7." x) + 4f0) + (ἡ εὐ πρκῷ 
~ 00 θ6θα7 
(8.7.11) 


The abscissas and weights corresponding to formulas for which 2 < m 


<5 


are listed (to six digits) in Table 8.3. More extensive tabulations are listed in 


the references. 


Table 8.3 

m Abscissas Weights 

2 + 0.707107 0.886227 

3 0 1.181636 
+ 1.224745 0.295409 

4 + 0.524648 0.804914 
+ 1.650680 0.0813128 

5 0 0.945309 
+0.958572 0.393619 
+2.020183 0.0199532 


It should be noticed that no restrictions are imposed on € in the error 


formula (8.7.7), other than that it be real. (Similarly, in the error formula of 
the preceding section, it is known only that ¢ is real and positive.) Thus, in 
those cases when f™(x) varies greatly in magnitude when m is large, as x 
takes on all real values, the imprecision associated with the use of the error 
formula (8.7.7) is correspondingly great. 
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In this connection, it may be recalled that the truncation error in the general 
gaussian quadrature formula (8.4.6) can be expressed in the form 


E = ΤῈ I is Nip Moonen Meg ἀρ» δ 11. :18:7.12} 
according to (8.4.10), where a < €, < ὁ. This form reduces to the form of 
(8.4.9) 

- LE) (5713 
AZ (2m)! 


when the divided difference is replaced by f?™(€,)/(2m)!, where a < ἔξ, « b. 
Since ¢, and ¢, generally cannot be estimated, one generally must replace either 
the divided difference or the derivative by its maximum absolute value for all ἔ 
in [a, b], to obtain an upper bound on [ΕἸ, and it may happen that the bound 
obtained from (8.7.12) is much less conservative than that obtained from 


(8.7.13). 
In illustration, if f(x) = 1/(x + a), there follows 
1 
X19 X 45-65 Xs Xms Oy | = ——  -.  - - 
ih U* Tete) -@ + Pet ἐρ 
and 


FOE) _ ΘΞξΞἢΔ 
Qm)! (a + ξ,)511 


Thus, for example, if five-point Laguerre-Gauss quadrature were to be used to 
approximate the integral 
dx 
o ΧΈ 


the truncation-error terms corresponding to the use of (8.7.12) and (8.7.13) 
would be of the forms 


(120)? . 0.0060 0... 1.44 x 104 
(2.39 x 10)\1 + ὁ) 14+, (1 + ἐξ.) 


respectively, where the abscissas are taken from Table 8.2 and where ¢, and 
¢2 are known only to be positive. Accordingly, the use of (8.7.12) here permits 
the cletermination of an error bound which is smaller than that obtainable from 
(8.7.13) in a ratio of about 2.4 x 10°. The actual truncation error rounds to 
0.001.3. 

Whereas this case is a rather extreme one, still, in those special cases 
when f(x) is such that an upper bound on the magnitude of the relevant divided 
difference can be obtained or estimated practically, the use of (8.7.12) is usually 
preferable to that of (8.7.13). 
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8.8 Chebyshev-Gauss Quadrature 


For the weighting function 
1 


V1 -- x? 


w(x) = 


over the interval [—1, 1], the results of Sec. 7.9 give 


n(x) = -- "Οὐ 


m 


(8.8.1) 


(8.8.2) 


where T(x) is the mth Chebyshev polynomial. With the additional results 


ye a ἃ, 5Ξ 


m 


NIA 


the relevant gaussian formula is obtained in the form 


* f(x) _ fs 
[. ἔπεσες a? dx = Ps H,f(x,) + E 


where x; is the ith zero of T,,,(x) and where 


7 
Tin(X i) "5 +1 (x ι) 


B= _2n 
2"(2m)! 


and 
lies (9) 


where |é] < 1. 
Since 
Τ᾽ (ΑἹ = cos (m cos~? x) 


the abscissas are obtainable in the explicit form 


x; = cos | AO | Ce 1..2:.τὐς 


2m 


Also, direct calculation shows that 


ΒΕ, ᾿ 
τοὺ =P Tae) Ξ(- Ὁ sin αὶ 
sin α; 
where 
2i — 1 


“= 
2m 


π 


and hence (8.8.5) reduces to the remarkably simple form 


(8.8.3) 


(8.8.4) 


(8.8.5) 


(8.8.6) 


(8.8.7) 


(8.8.8) 


(8.8.9) 


(8.8.10) 


(8.8.11) 
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Thus the weights in (8.8.4) are all equal. 
The formula (8.8.4) hence can be written in the explicit form 


"lala τς a 2h γμηὴ 8.8.12 
[. rea dx =o (c0 ΤΣ ) ἮΝ 22"(2m)! ζ( ὦ (δ.8.12) 


where |¢| « 1. 
The differential recurrence formula 


(1 — x°*)T),(x) = —mxT,(x) + MTy_ (x) = MxTy(x) — MT yn 4 1(x) 
(8.8.13) 


(which was not needed here) is included for reference purposes. 


8.9 Jacobi-Gauss Quadrature 


Many of the other gaussian quadrature formulas which have been investigated 
in the literature correspond to the use of a specialization of the weighting 
function 


w(x) = -x1+xP @>—i,p> —-) (8.9.1) 


over the interval [—1, 1], or to the result of transforming this problem to the 
interval [0,1]. The special cases « = B = 0 and « = B = —} have been 
considered in Secs. 8.5 and 8.8. 

In the general case, we may take x(x) as the appropriate multiple of the 
polynomial 


PX) 


| 


C,(1 — Χ Ὁ + xyF τς [ad -- x)**™ + x)Pt™] 
x 


(—1)"C,.2"m! S (” + « ᾿ B+ gs ᾿ Oe - Ἵ 


k=0 


C._V,(x) (8.9.2) 


ttl 


which, as was noted in Sec. 7.9, reduces with a certain (not universally agreed 
upon) choice of C,, to the mth Jacobi polynomial.t 
The coefficient of x” is found to be 


Γῶμ + a+ Bp + 1) 


ὡς reese eT 


C, (8.9.3) 


t See Szego [1967]. The choice made in that reference is C,, = (—1)"/(2"m!). 
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whereas the normalizing factor is obtained, from (7.5.14), in the form 


ΤΩ + a+ B +1) 
Tim + a+ 6 + 1) 
TQm+a+ 6 + 1) 
I(m + a+ B + 1) 
© T(m + «+ 1 (m + β + 1) y2mtat prt 
TQm + a+ B + 2) 
2. α ΡΤ} Tim +a Ὁ 1)0(m + B + 1) 


= ΟΣ eee ey (8.9.4) 
2m+a+Pp+1 I(m+a+ 64+ 1) 


| 
ϑ- 
N 


1 
Ym = Cim | ( — γ΄ + x)/P*™ dx 
-1 


ΞΕ 2 
= Crm 


where use was made of the formula 


(1 _ ΧΆ τε x)! dx Ξ- ΡΊΑῚΤΊ I(p ΕΞ 1)Γ(ᾳ + 1) 
-1 I(p + q + 2) 


(p>-l,q>—-1l) (8.9.5) 


The results (8.9.2) to (8.9.4) reduce to (8.5.1) to (8.5.3) when « = B = 0 
with the choice C,, = (—1)"/(2"m!), corresponding to @¢,,(x) = P,,(x), and 
to (8.8.2) and (8.8.3) when a = B = —4 with the special choice C,, = 
(—2)"m!/(2m)!, corresponding to @,,(x) = T,,,(x). 

Thus we obtain the quadrature formula 


1 m 
| (1 — x) + x)f(x) dx = > H,f(x,) + E (8.9.6) 
--1 k=1 
where x; is the ith zero of V,,(x), and where, from (8.4.16) or (8.4.17) and 
(8.4.10), 


2m ta+B+20(m+a+ it(m + B+ 1) 2774778 F Im! 


--- YD 
« — ger τ ἝἷἝἾἼἾΥἧἝἝ Ῥε......-.-----ς.--ς-.- ᾿  ’“------- 


! 


mt+tat+tpt+i T(m +a+ 6 + 1) Vid Xi) Ving 10%) 
(8.9.7) 
and 
ε- Lim +a t IM + B+ ἸΓίηι Ἑ α + β + 1) 2551  Ρ Tm! some) 
Qm+a+ P64 Γι τ a+ Bp + 1)12 (2m)! 
(8.9.8) 


where |é| < 1. Integration based on (8.9.6) is sometimes known as Mehler 
quadrature. 
It is possible to establish the relation (see Szego [1967 ]) 
(2m + a+ B + 2) — ΧΡ ΟἹ 
=(m+a+ 84 1)[Qm+a+ 8B + 2)x + ἃ — B)]V,(x) 
+(m+at+ B+ I)Vysi&) (8.9.9) 
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from which there follows 
2ι ται β 2 


( — x7)V,,(x,) 
m+t+at+f4+1 


Vine (Xj) = 


so that (8.9.7) can also be written in the somewhat simpler form 


H, = I(m+a+ ΓΙ + B+ 1 22m tet8t+I1 yy) 
Tm+a+B+ 1 (1 — ἀν, ]: 


As an example, we consider the weighting function 


w(x) =1—x? (8.9.11) 


(8.9.10) 


in which case (8.9.2) gives 
Val) = (1 -- x2)? Sg - ayers 
dx”™ 


By making use of the relationship 


cB ibe 1)" αὖ ΤῈΣ (r — k)! (x 2 1} d*P,(x) (8.9.12) 
axto* AG + k)! ἕ 


this result can be written in the form 
(—1)"2"**m! dP,, + τ) 


Vae 
) i π ὃ de 


Hence there follows 


[ (1 — x?) Χ) dx = 5 A,f(x,) + E (8.9.13) 
= K=1 | 


where x; is the ith zero of P,,, ,(x), and where, from (8.9.10) and (8.9.8), 


Am + 1)(m + 2) 


H. = 
4 - x7) [Pe 4 (x) 7 


L 


(8.9.14) 


and 


fP™E) (8.9.15) 


Ex m!(m + 2)!] (m + 1)! 12 2251Ὲ3 
(2m)! (2m + 2)!| 2m + 3 


Since P,,,4.1(x) satisfies the differential equation 
(1 — ΧΡ". (x) — 2xP!44(x) + (m + 1)(m + 2)P,,4,(x) = 0 
there follows 
(1 — ΧΡ s(x) = —(m + 1)(m + 2)Ρ,, (x) 
when P,,1(x;) = 0, so that (8.9.14) can also be expressed in the form 
2(1 — x?) 


(m + 1)(m + 2) P+ 1(x;)]? ae 


i> 
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8.10 Formulas with Assigned Abscissas 


In some applications it is desirable to prescribe one or more of the m abscissas 
to be involved in a quadrature formula. In particular, whereas none of the true 
gaussian formulas involves the values of f(x) at the ends of the interval, it is 
sometimes important that one or both of these end values be used. It may be 
expected that, for each arbitrarily prescribed abscissa, the degree of precision 
generally will be reduced by unity below the maximum value of 2m — 1. In 
particular, if all abscissas were prescribed, the maximum degree of precision 
would generally be reduced to m — 1. Naturally, exceptions occur when the 
abscissas are preassigned in favorable ways. 

Whereas the gaussian formulas were derived in Sec. 8.4 from the hermitian 
formulas, by requiring that the m weights H, vanish, a somewhat different 
approach (which also could have been used in the gaussian case) is desirable 
here. We recall first that the lagrangian quadrature formula 


: w(x) f(x) dx = > W(x) + E (8.10.1) 
where ; 
n(x) = (x — χι)α — Χμ) (ὦ — Xp) (8.10.2) 
and 


b b 

W, = | w(x)l{x) dx = : w(x) ΕΒ dx (8.10.3) 
a t'(x;) a XxX — Xj 

always has a degree of precision of at least m — 1. Now any function f(x) 

can be expressed as the sum 


f(x) - Pm—1(X) τὰ n(x) f x4, Xao-++9 Xm x] (8.10.4) 


where p,,— (x) is the polynomial, of degree m — 1 or less, agreeing with f(x) 
at the m points x,,..., Xm, and where f[x1,.--»5 Xm x] is the mth divided 
difference of f(x), relative to x1,..-, Xm, defined in Chap. 2. Hence f(x) can 
be replaced by that sum in (8.10.1). But since the two terms involving p,,—1(*) 
in the result will cancel, and since x(x;) = 0, we thus obtain the expression 


b 
ἘΦ ῈΞ Ϊ γν(χ)πα) [χ!-----. Xm Χ] dx (8.10.5) 
for the error E in (8.10.1).7 

If f(x) is a polynomial of degree m + 1, its divided difference of order m 
is a polynomial of degree r, and conversely. Hence we deduce from (8.10.5) 


+ This result is equivalent to (5.10.34). The derivation is repeated here, for complete- 
ness, in the modified notation of the present chapter. 
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that the quadrature formula (8.10.1) has a degree of precision of at least m + 
r — 1 if and only if the polynomial n(x), whose m zeros are the abscissas, is 
orthogonal, relative to w(x), to all polynomials of degree less than r. When 
r = m, this result reduces to the result of Sec. 8.4 and serves to specify the 
gaussian quadrature formulas, which were also derivable as special Hermite 
formulas for which H; = 0, and for which also H,; = W,. 

Now suppose that m — r of the m abscissas are preassigned, leaving the 
r “free” abscissas x1, X2,..., x, to be determined so that the degree of precision 
will be maximized. If we write 


n(x) = [ὰ — x4)++*(& — x) ILO — x41) °° τ x] 
5e(x)v(x) (8.10.6) 


where 
Wx) = (KX — Xy)(X — X2)°*-(K — x,) (8.10.7) 


is a polynomial of degree r whose r zeros are the free abscissas, which are to be 
determined, and where 


U(x) = (% — Xe -- X42) τ — Xm) (8.10.8) 


is a known polynomial, of degree m — r, whose zeros are the preassigned 
abscissas, the condition 


[ w(x)n(x)u,_ (x) ἀχ -ο = (8.10.9) 


where u,_ ,(x) is an arbitrary polynomial of degree r — 1 or less, takes the 
form 


| [w(x)o(x) J(x)u,_ (x) dx = 0 (8.10.10) 


Thus we may consider (x) as the appropriate multiple of the rth member 
of a set of polynomials ¢,(x), ,(x),..., ?,(x),..., of degrees 0,1,..., 
r,..., respectively, which are orthogonal over [a, δ] relative to the modified 
weighting function 


w(x) = w(x)o(x) = (8.10.11) 


and the methods of Sec. 7.5 are again available for its determination. However, 
if v(x) changes sign in [a, δ], the modified weighting function w(x) will have the 
Same property. Thus there is then no assurance that the zeros of (x) will be 
real or, if so, that they will lie inside [a, δ]. In the important cases for which 
only one or both of the end points x = a and x = b are taken as preassigned 
abscissas, so that v(x) is given by x — a, x — b, or (x — a)(x — δ), this 
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difficulty does not arise since then v(x) is of fixed sign in [a, δ]. Attention will 
be restricted to these cases in what follows. 

In order to evaluate the weights W,, we write z(x) = $,(x)/A, to take 
into account the fact that the polynomial ¢,(x) which is most conveniently 
employed may not have unity as its leading coefficient, and notice that then 


1 
mx) = — υ(χ)φ,(Χ) 
A, 
where v(x) is defined by (8.10.8). Equation (8.10.3) then becomes 


1 ἜΣ 
ee dx (8.10.12 
TOR |, Oe craweea) 


and is, of course, independent of A,. For i = 1, 2,..., r the abscissa x; 15 a 
zero of @,(x). Hence there follows 


W, = 
v(x; τ, 
for i = 1,2,..., 7, and a comparison of this form with cis with m re- 
placed by r and w(x) by w(x), leads to the desired result 


Ξ παρ |. να )υ(α) --τπ a ds (8.10.13) 


Ww, = —-——4r#t@ __ = 1,2...) (8.10.14) 
A,v(x)PA(x)®, πε 1(X;) 


where A, is the coefficient of x’ in ,(x), and where 


i, = [ WOOLF ax = | wonenleCF dx 8.10.5) 


Equation (8.10.14) determines all weights except those corresponding to the 
preassigned abscissas. 
In the case when only the abscissa x = a is preassigned, so that 


v(x) =x-—a 


the corresponding weight is expressed by (8.10.12) in the form 


ee | ἰὐὐδ ξ. ssa): i016) 
Φ,(α) Ja 
whereas when only x = ὦ is fixed, so that 
v(x) =x — ὃ 
there follows 
1 b 
W= ὙΠ | w(x)(x) dx (x; = b) (8.10.17) 
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In the case when both x = a and x = b are fixed, so that 
v(x) = (x — a\(x — δ) 
there follows 
1 b 
W = ——_—— | (ὁ -- x)w(x)¢,(x) dx (x; = a) (8.10.18) 
(ὁ — a)d,(a) [ 
and 


a)w(x),(x) dx (α, = b) (8.10.19) 


W= ert ie = 
(ὁ — a)d,(b) Ja 


Alternatively, the weights corresponding to the prescribed end ordinate 
or ordinates can be determined in terms of the remaining weights by use of one 
or both of the relations 


m b m b 
> We= Ϊ w(x) dx > xW, = ἢ xw(x) dx (8.10.20) 
k=1 a k=1 a 


which require that the error in (8.10.1) vanish when f(x) = 1 and when f(x) = x, 
respectively. 

The special cases in which w(x) is constant are treated in the two following 
sections. 

In the general case, it is possible to show that the relevant quadrature 
formula can be obtained by replacing f(x) in the integrand by the polynomial 
of degree m + r — 1 which agrees with f(x) when x = x,,..., x,, and whose 
derivative agrees with f’(x) at the unassigned points x = x,,..., x,. Thus the 
error can be expressed in the form 


b 
Ε- | we = 21) = x) τ χορ 2 
XS [Mii χαροννν Mp Me Xeavers Meee] ἀχ (8.1021) 


In particular, if w(x) 2 0 in [a, 6], if no assigned abscissas lie inside [a, b], 
and if f*”(x) exists in (a, δ), there follows also 


E= ΓΝ Ὁ δ 


(m ΕἾ ry! w(x)[ (x -- x,)° , “(x = x,)]? (x oa Χ, 4)" : (x ΝΕ Xm) dx 


(m+r) b 
= eet w(x) [%(x)]? dx 


_ i £PEH 


10. 
ΑΞ (μι + r)! eee) 


where ¢ lies between the largest and smallest of x,,..., Xm a, and b. 
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8.11 Radau Quadrature 


In the case of a finite interval, with a unit weighting function, when one end of 
the interval is assigned as an abscissa, it is again convenient to suppose that the 


interval has been transformed to [—1, 1], with x = —1 as the fixed abscissa, 
by an appropriate change in variables. We then have 

v(x) = xt+1 (8.11.1) 
and 


mx) = (x + Dax) (8.11.2) 


where z(x) is a multiple of the rth member of a set of orthogonal polynomials 
Po(x), 1(), ..., &(x),..., which has the property 


[ (x  τ)φ,(χ)ε,. (x) dx =O «(δ.11.3) 
-1 


where u,_ ,(x) is an arbitrary polynomial of degree r — 1 or less. 
If we follow the procedure of Sec. 7.5, by writing 


(x + 1)6,(x) = 


U,(x) 
and integrating the left-hand member of (8.11.3) by parts r times, we find 


that (ΑἹ) must satisfy the equation 
r+1 r 
ae te alas 
dx’*1}x + 1 dx’ 


and the requirements that U,, U},..., 61 vanish when x = +1, and hence 
that U. must be of the form 


U, a C(x av 1)(x? τ 1} 
Thus it follows that 


φ,(Χ) = ~ E(x + 16 -- 1] (8.11.4 


ar 


which can be εν in the form 


φ,(Χ) = 


τι ὦ + n4 -- = (x —i/t+r = (x? — υ 
dx’ 


or, by making use of the relationship (8.9.12), in the form 


φ,(Χ) = 2'r! C, l? 


It is convenient to take 


ΞΈΡΩ] (8.11.5) 


Cc, = — 8.11.6 
2'r} ( ) 


GAUSSIAN QUADRATURE AND RELATED TOPICS 407 


Then, noticing that here r = m — 1, since only one abscissa is preassigned, 
we conclude that the m — 1 free abscissas are the zeros of 


bm 0X) = P(x) + ΖΞ Pr_y(ay = Pat) + Pol) (4.11.2 
m 1+-x 
where the last form follows from the recurrence formula (8.5.7). The leading 
coefficient is found to be 
(2m — 1)! 


Ag Ξ 5 τες 
ἐς 2 πη ζ(ηι -- 172 


(8.11.8) 


With this result, we notice next that’ 
1 
Pm—1 = Ι 4 ΕΝ χ)φ,.--«(Χ)φ,.-- 1(X) dx 


7 τ᾿ Ὁ “11 + δ; ale) dx 


ἘΠΕ [ ale an [( + x)(x? — 1)" dx 


~ 2"-Im — ἢ! xn 


and an (m — 1)-fold integration by parts, followed by the use of (8.9.5), leads 
to the simple result 


i,1=— (8119) 
m 


Thus, by introducing (8.11.8) and (8.11.9) into (8.10.14), we obtain the weights 
_ 22m + 1) 1 
m(m + 1) 1 + x)Pn-1%)bm(X) 


corresponding to the m — 1 free abscissas. By making appropriate use of the 
formula (8.5.7), together with the fact that ¢,,_ ,(x;) = 0 implies 


W, = (x, # —1) (8.11.10) 


1 — x; 


Pr—1(%) = Pin-1(%;) = —P,,(x;) 


we find, after some manipulation, that 


on Py 1(X;) Pm(X;) = 7 ice m— 1(%;) 


m—1(X;) = 
Pm— 10%) [Xe m+ 1 


so that (8.11.10) reduces to 


eae 2 Tp’ (\12 ix 1 8.11.11 
[PGF 1-5 Peace Se) ΟἼΗ) 
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The weight corresponding to the abscissa x = —1 follows from (8.10.16) 
in the form 


1 1 
W = —— Pin —1(X%) dx (x; = —1) (8.11.12) 
Pm—1(—1) I, 
We obtain first 


: : χ-- 
| ΓΤ { [Pas ‘ Pa) i 
-1 -- 1 mM 


or, after integrating the second member by parts and noticing that the first 
member integrates to zero (when m > 1), 


1 een 
| ies ape -- 
Ss . = 


since P,,-,(—1) = (-1)"™"'. By making use of the additional fact that 
P)._4(—1) = (—1"m(m — 1/2, we obtain also 
bp (—-1) = (-1)"74m (8.11.13) 
and hence (8.11.12) becomes 
2 


W=— (x= 1) (81114) 
m 


The error term is obtained, by use of (8.10.22), in the form 
2?m~*m[(m -- 1)!]* (2m—-1) 
E = ——__—- μα «1 8.11.15 
= oe ee oe 


Thus, in summary, we have obtained the quadrature formula 
1 9 m-1 
| f(x) dx = με f(-1) + me W(x) + E (8.11.16) 
—_ 1 = 


where x; is the ith zero of the polynomial (8.11.7), and where the weights are 
defined by (8.11.11) and are positive. This formula is one of several attributed 
to Radau. 

The first six of the polynomials are found to be of the form 


do(x) = 1 o,(x) = 33x — 1) 

(x) = 45x? -- 2x -- 1) φ:0) = $(35x° — 15x* — 15x + 3) 
a(x) = Ay(315x* — 140x3 — 210x? + 60x + 15) 

φ.() = -ε(31χ' — 105x* — 210x* + 70x* + 35x — 5) 


(8.11.17) 
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and additional ones can be obtained from the recurrence formula 


ioe 


ee aE {[(2r + 1)(2r + 3)x — 1]46,(x) 


— r(2r + 3)¢,-,(x)} (8.11.18) 
or by reference to (8.11.7). 
In the simplest nontrivial case, m = 2, there follows x, = 4. The weight 
W, 15 found to be 3, and the weight relative to x = —1 to be 4. Thus, the best 
two-point formula with x = —1 preassigned is of the form 


[70 dx =$f(-)+2/@+ Af" (ll < (8.11.19) 


By setting x = 21 — 1, and writing f(2t — 1) = F(t), we may rewrite this 
formula in the form 


if F(t) dt = 4F(0) + 3FQ) + τς) =O <n <1) (8.11.20) 


and similar forms can be obtained in the other cases. The abscissas and weights 
corresponding to formulas for which 2 < m < 5 are listed (to six digits) in 
Table 8.4. More extensive tabulations are listed in the references. 


8.12 Lobatto Quadrature 


Table 8.4 
m Abscissas Weights 
--Ἰ 1 
0.333333 3 
3 —1 0.222222 
— 0.289898 0.752806 
0.689898 1.024972 
4 —] 0.125000 
— 0.575319 0.657689 
0.181066 0.776387 
0.822824 0.440925 
5 --1 0.080000 
— 0.720480 0.446207 
— 0.167181 0.623653 
0.446314 0.562712 
0.885792 0.287427 


In the case when both ends of the interval [—1, 1] are preassigned as abscissas, 
the weighting function being unity, the derivation is quite similar to that of the 


preceding section. Thus, with 
v(x) =x*-—1 (8.12.1) 
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it is found that 
_G Δ 


ae (x? — 1.1: (8.12.2) 


P(x) = 


and that, in accordance with (8.9.12), if we set 
eZ 
oe ΣῪ 


(8.12.3) 


this result is of the form ¢,(x) = P,;.,(x). Hence, since here r = m — 2, 
the free abscissas are the zeros of the polynomial 


Pimn—2(X) = Pry—i(x) (δ.12.4) 
The additional results 


(2m — 2)! 
A,,-2 = =a 8.12.5 
2 oem Dine “ 
and 
in. = _ 2mm — 1) (8.12.6) 
2m — 1 


the negative sign in (8.12.6) being a consequence of the fact that here v(x) 
is negative in (—1, 1), are obtained by methods similar to those of the preceding 
section. Next the weights corresponding to the free abscissas are obtained in 
the form 

2m 


Πα -- x?)P%_ (x) Pix) 


which can be rewritten more conveniently as 


μ᾽ = (x; # +1) 


2 
ee mm — 1}Ρ.-.«( Ὁ] 


The weights corresponding to the fixed abscissas x = +1 are found to be equal 
and to have the value 


(x; F +1) (8.12.7) 


= Pu) Pa DL 95 = 41) 6.12.8) 
Pi,-1(1) Ρ,,.-(--ξ} και — 1) 
which is the same as that given by the right-hand member of (8.12.7) when 
Xj ἘΞ +1. 
The error term is obtained from (8.10.22) in the form 


ow 2?""Im(m — 1)°[(m — 2)11" (2πι-- 2) 
E= Gm - Dlom - DP —Dlom — D2 7 (2 (él «1) (δ.12.9) 
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and the corresponding quadrature formula 


—*___[f() + f(-D] +S WS) + E (8.12.10) 
mim — 1) k=1 


where x; is the ith zero of P,,_ ,(x), and W;, is given by (8.12.7) and is positive, 
is known as Lobatto’s quadrature formula. 

In the simplest nontrivial case, m = 3, the free abscissa is found from 
the equation P3(x) = 3x = 0 to be x = 0, as would be expected from 
the symmetry. The corresponding weight is found to be 4, whereas the weights 
corresponding to x = +1 are each 4. Hence, as also might have been antic- 
ipated, the Lobatto formula reduces in this simple case to Simpson’s rule. 
The abscissas and weights corresponding to formulas for which 3 < m < 6 
are listed (to six digits) in Table 8.5. More elaborate tabulations are listed in the 
references. 


[ fix) dx = 


Table 8.5 

m Abscissas Weights 

3 0 4 
+1 3 

4 + 0.447214 3 
+1 Φ 

5 0 32 
+ 0.654654 $3 
+1 ws 

6 +0.285232 0.554858 
+ 0.765055 0.378475 
+1 0.066667 


When the Lobatto formula is applied to a function f(x) which vanishes 
at both ends of the interval of integration, so that only r = m — 2 ordinates 
are actually involved in the calculation, the degree of precision is 2m — 3 = 
2r + 1. Similarly, when the Radau formula is applied to an integrand which 
vanishes at the lower limit, so that r = m — 1 ordinates are used, the degree of 
precision is 2m — 2 = 2r. Thus, in such cases, a higher effective degree of 
precision is attained than that afforded by the formulas of gaussian type, in 
which the use of r ordinates leads to a degree of precision of 2r — 1. However, 
in special cases where f(x) has a singular derivative at an end point at which 
F(x) = 0, the associated advantage of an open formula may offset this con- 
sideration. 
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8.13 Convergence of Gaussian-Quadrature Sequences 


For the purpose of considering the behavior of a sequence of approximations 
generated by a specific gaussian (or gaussian-type) formula as the number m of 
ordinates is increased, we express the m-point error in the form 


En = Ralf] = [ wef) dx — 5 wryes) — 6.13.1 


where the weights and abscissas now are supplied with the previously implied 
index m. 

We first restrict attention to the cases when [a, 6] is finite. In order to 
include all such cases so far considered in this chapter as well as a class of other 
ones, we suppose here only that the weighting function w(x) is nonnegative in 
(a, b) and all the weights W{” are positive: 


ν ΣΟ (asxsb WiP>O0 (8.13.2 


We also assume that the degree of precision of the formula in question is 
positive and increasing with m. (In the true gaussian cases that degree is 2m — 1; 
for Radau or Lobatto quadrature it is 2m — 2 or 2m — 3, respectively.) We 
require of f(x) only that it be continuous on [a, δ]. 
With these assumptions, given any positive number 8, no matter how small, 
the Weierstrass approximation theorem (Sec. 1.2) guarantees the existence of a 
polynomial p(x) for which 
if) —pXl<e (xb) (8.13.3) 


If M is the degree of p(x), we next take m sufficiently large that the degree of 
precision of the quadrature formula in question exceeds M, so that Rn| p(x)] = 0. 
Then, since we have 


Rul f] = Κα f -- Pl] + Rul P| 


from the linearity of the operator R,,, there follows 


a 


b m 
Ralf = | wNLF@) — poy de — Σ ΠΡ ΟΦ — 2G] 
< { * waDlfa) — pool dx +S ΛΟ 9) — poe”) 
Ξε [ w(x) dx + Σ wor | = 28 [ w(x) dx (8.13.4) 


a 


Here use was made of the assumed properties of w(x) and W™ and of the fact 
that R,,[1] = 0 implies the relation 


Ϊ ᾿  82:Ξ Σ we (8.13.5) 
k=1 


a 
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Hence, since accordingly |£,,] can be made smaller than any preassigned 
positive quantity by taking m to be sufficiently large, we may deduce that in 
fact E,, > 0asm -- oo. 

Although this conclusion thus follows for all the preceding formulas in this 
chapter for which [a, b] is finite, some additional comments are in order. In the 
case of the Jacobi-Gauss quadrature of Sec. 8.9, over (—1, 1), by making use of 
the Stirling approximation (3.9.6) to the factorial, and of the fact that 
Tm +k + 1) = (m+ ΚΑ)! ~ m‘m! asm --Ἔ o, when k is fixed, we find from 
(8.9.8) that | 


π fo Em) 
Em ~ ima Omi (813-6) 


when m is large, in the general formula (8.9.6), where —1 < ὅ,, < 1. This 
result holds, in particular, in the important special cases « = B = Ο (Legendre- 
Gauss), « = B = —4 (Chebyshev-Gauss), a = B = 4 (Prob. 24), and a = 
B = 1 [Eq. (8.9.13) ]. 

Thus, if f(z) is an analytic function of the complex variable z in a region 
& of the complex z plane including the real interval —1 < x < 1, and if Ris the 
shortest distance from a singular point of f(z) to a point in that interval, then, 
as in Sec. 3.9, we can deduce from (8.13.6) only that there exists a constant C 
such that 


[Eel <—“— (137 
(2 R)?™ 

when m is large. This fact would permit us to deduce convergence only when 
f(z) is such that R > 4, whereas actually convergence has just been established 
without any restriction on f(x) except for continuity on [—1, 1]. In addition, 
however, it suggests that the rate of convergence when R is smaller than about 4 
will tend to be significantly less favorable than that described by (8.13.7) when 
R> 45. 

When the interval of integration is not finite, a less simple approach is 
needed since the Weierstrass theorem no longer is available. However, in the 
cases of Laguerre-Gauss and Hermite-Gauss quadrature, convergence has been 
established when f(x) is continuous in every finite subinterval of [0, 00) [or 
(— οὐ, 00)] and also f(x) is such that 

WLC] < τς (8.13.8) 
for some p > 0 when x [or |x|] is sufficiently large. 

It may be noted that for Hermite-Gauss quadrature the asymptotic bound 


corresponding to (8.13.7) would be proportional to μιῇ 2 R)*", whereas for 
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Laguerre-Gauss quadrature it would be proportional to (m!)?/R*", from (8.6.7) 
and (8.7.7); both of these quantities increase unboundedly as m — oo for any 
fixed R. Thus, in these cases an unfavorable rate of convergence may be feared 
unless f(z) is an entire function of the complex variable z (that is, is analytic 
everywhere in the finite part of the complex z plane). 

For example, if f(x) = 1/(1 + x’), itis seen from the Maclaurin expansion 
of f(x) that f?™(x) = (—1)"(2m)! when x = 0, so that in the evaluation of the 
integral 


| en? ὦ. 1.34329 


τ ΓΕ ee 


by m-point Hermite-Gauss quadrature, the formula (8.7.7) would admit the 


possibility of an error as large as “π (m!/2™) if the appropriate (but unknown) 
value of €,, were near zero. If this were indeed the case, the error would increase 
rapidly with m when m > 2. Thus the above-mentioned convergence implies 
that |¢,,| truly increases rapidly as m increases. The slowness of the convergence 
in this case was first noted by Rosser [1950], who reported that the errors 
corresponding to the use of 2, 10, and 16 points are about 0.16, 0.0016, and 
0.00016, respectively. Similar slow convergence may be expected, more gener- 
ally, whenever f(z) is not an analytic function of z, or when f(z) possesses 
singularities in the finite part of the complex z plane which are “fairly close” 
to the real axis. 


8.14 Chebyshev Quadrature 


By imposing various restrictions on the abscissas and/or weights in a formula 
of the type (8.10.1), various classes of quadrature formulas may be obtained 
in addition to those so far considered. In this connection, it may be noticed that, 
if the abscissas are required to be equally spaced, the Newton-Cotes formulas of 
Chap. 3 are obtained when w(x) = 1. In this case, m abscissas are fixed and the 
degree of precision may be expected to be reduced from 2m — 1 to m — 1. 
However, as was seen in Chap. 3, when m is odd, so that the midpoint of the 
interval is one of the abscissas, the degree of precision is increased to m.f 
Another interesting class of formulas, associated with the name of Cheby- 
shev, is that in which all the weights are made equal. Whereas the significance 
of these formulas is perhaps more academic than practical, equality of the 
weighting coefficients is desirable, not only for convenience, but also in order 


+ It should be recalled that m here corresponds to n + 1 in Chap. 3. 
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that the effects of errors in the ordinates will be minimized. Here the common 
weight and the m abscissas are “free,” and it may be expected that a formula 
with a degree of precision of at least m may be determinable. However, this 
expectation is not always to be realized. 

We suppose again that the original interval has been transformed into 
[—1, 1], so that the desired formula is of the form 


[. w(x) f(x) dx = W pe f(x) + ELf(x)] (8.14.1) 


where W is the common weight. It may be noticed first that the weight W 
cannot be assigned if the degree of precision is to be positive, but is determined 
by the requirement that E = 0 when f(x) = 1 in the form 


μ -- ὦ 
m 


1 
where A = | w(x) dx (8.14.2) 
-1 


Now we assume that a set of m abscissas x; exists in [ —1, 1] such that the degree 
of precision is indeed at least m and, as before, we write 


M(x) = ( — χι)ὰα — χω}.  α — X%_) (8.14.3) 


Then, following the derivation of Chebyshev, we identify f(x) in particular with 
the special function 


σε wet (144 
u—xX 


in which case (8.14.1) becomes 


: ARN 1 
[. w(x) ; ae pegs = + Εἰς Σ Η (8.14.5) 


The reason for choosing the special function 1/(u — x) is now seen if we 
notice that since 


log z(u) = ae (u — x;,) 


TMs 


the finite sum in (8.14.5) can be expressed as 
d 
— log πίω 
ee 


and hence that equation becomes 


[ w(x) Β΄. = a oe flog z(u)] + E ane (8.14.6) 
m du u—x 


5. u—x 
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or, after an integration with respect to u, 


[ w(x) log (u -- x) dx = const + a log m(u) — Q(u) ά(δ8.14.7) 
a m 


[ al : 4 (8.14.8) 
, |1-- χ 


Equation (8.14.7) can be resolved in the form 


where 


O(u) 


m(u) = C,, exp Ε [ w(x) log (u — x) dx + τ 20 


or, equivalently, in the form 


mu) = C,,u™ exp [ [ w(x) log (: - ἢ ax| exp Ε 210 | (8.14.9) 
-1 


Now the error term in (8.14.6) is expressible in the form 


1 mt+1 
7 a Gs) Ξ τι 1 -- ds 
u—-x ay gmtl i= 


= (m + 1)! a (8) "os (8.14.10) 


where G(s) is the influence function defined by the relation 


m! G(s) = [ w(x)(x -- 5)" dx -- ia > (x, — 5) (8.14.11) 
: m 


in accordance with (5.10.15) and (5.10.16). Accordingly, there follows 


O(u) = [1|: ΓΞ ἊΝ σ6)-- Σ΄ 8.14.2) 
sa. HE 5 (u -- 5)" 


For present purposes, it is not necessary to evaluate this expression explicitly. 
However, it is important to notice that it can be expanded in the form 


O(u) = μι! 2. Ι G(s) [ + (m + 1)° 
u οἵ u 


Met DE 4] as 


2! ur 


Jo σι 


μιὉ1 με 12 


ΓΝ (8.14.13) 
μ 


since u > 1, where the coefficient g, is a certain multiple of ft , S*G(s) ds. 
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Similarly, we see that 


| w(x) log (: - ἢ ἐχ 
ἘΠ u 


when u > 1, where 


I 
| 
Ι μ᾿ 
= 
o~ 
« 
ewe’ 
21s 
+ 
S| 
N 
+ 
ea” 
Qu. 
ἐς 


Εν ψ τ ΜΉΝ (8.14.14) 


1 
ὧς = Ϊ x*w(x) dx (8.14.15) 
-1 


Hence (8.14.9) can be expanded in the form 


cairo (2+ 3 +] 
A\u 2u 


Pe mc mM go 
C,,U - - τ. ...}]1-- Ἔ τ 8.14.16 


π(μ) 


I 


where the two relevant power series in u~! converge when u > 1. But, since 
m(u) is a polynomial of degree m, the product of the two series will terminate 
before the term containing ε΄ π΄ ἢ. Thus the terms in the second series, after the 
leading term, therefore do not enter into the determination of the terms which 
will remain in the product, but serve only to bring about the cancellation of all 
terms involving πο ἢ, u~™~?, and so forth. Hence the second series can be 
disregarded, and the desired polynomial can be obtained by merely terminating 
the first series with the term involving u~™. Also, since the coefficient of u™ 
in πίμ) is to be unity, we must take C,, = 1. 

It thus follows that if x(x) exists such that (8.14.1) has a degree of precision 
of at least m, then n(x) is defined by the expansion 


exp Ε [ w(t) log (x — 1) a = x” exp Ε [ w(t) log (: - 4 α] 
a Aj _y x 


= m m C1 C2 m mc, m—-1 
= xX XP | --ἰ[- Fa Se eet gee 814.17 

| A (2 2x? Ι FT ( ) 
where the last series is to be terminated with the last term having a nonnegative 


exponent. 
In the special case when 


w(x) = 1 (8.14.18) 
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and hence also 
A=2 W = Zz (8.14.19) 
m 


the first four terms of the expansion of 
1 
x” exp 2 | log (: - : i 
2.1} x 
= x" exp| —m ese ee ee 
6x2 20x* 42χϑ 


m M m-2 m m-4 
mx) = x" —-—x + ——(5m — 18)x 
(*) 6 360 ° ) 


are found to be - 


m 


35m2 — 378m + 1080) χη 5 +--- (8.14.20 
45360 ° i ( ) 


where the series is to be terminated with the first term if m = 0 or 1, with the 
second if m = 2 or 3, and so forth. If the mth such polynomial is denoted here 
by G,,(x), the first six such polynomials are thus obtained as follows: 


Go(x) = 1 G,(x) = x 
σι = 43x? — ἢ G.(x) = 4(2x? — x) (8.14.21) 
Gi(x) = A(45x* — 30x7 +1) (ΟἹ) = 4(72x° — 60x* + 7x) 


It is seen that the polynomials of even and odd degrees are even and odd 
functions of x, respectively, so that their zeros are symmetrically placed about 
x = 0. 

It has been found that the zeros of the polynomials G,(x), G2(%),..., 
G.(x) and Go(x) are all real, that they lie inside the interval [—1, 1], and that 
the quadrature formula (8.14.1), with abscissas identified with a set of such 
zeros, accordingly does indeed have a degree of precision equal to or greater 
than the number of abscissas, when w(x) = 1. However, six of the zeros of 
G,(x) are nonreal, and each G,,(x) for m 2 10 possesses at least one pair of 
nonreal zeros (see Bernstein [1937]). Thus, when w(x) = 1, the quadrature 
formula is useful only when m < 7 and m = 9. 

The abscissas corresponding to the formula 


Ι f(x) dx = Ξ S τοὺ + E (8.14.22) 
5. m 


k=1 


for all relevant values of m, are listed to six digits in Table 8.6. Whereas the 
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Table 8.6 

m Abscissas Abscissas 

2 + 0.577350 0 

3 0 + 0.323912 
+ 0.707107 +0.529657 
oo ἧς + 0.883862 

4 + 0.187592 0 
+ 0.794 

5 εἰ ais +0.167906 

+ 0.528762 

+ 0.374541 + 0.601019 
+ 0.832497 ΕἾ 0.911589 

6 £+0.266635 
+ 0.422519 
+ 0.866247 


appropriate error term in each case can be expressed in the form 
1 
E = [ G(s) f"*(s) ds (8.14.23) 
Ξ- 1 


where G(s) is defined by (8.14.11), with w(x) = 1, recourse to the third method 
of Sec. 5.10 leads more simply to the desired results. For this purpose, we may 
notice first that, since the coefficient of χ in G,,(x) is unity, there follows 
w(x)n(x) = G,,(x). Further, by integrating the expressions given in (8.14.21), 
and determining the constant of integration in each case such that the integral 
vanishes at one (and hence both) of the limits +1, there follows 


G,(x) = [40° — 1] 

G(x) = [3 — Δ] = Geb? - 197)’ 

G3(x) = [:χ 2 -- 1)]’ (8.14.24) 

σι) = [είθχ᾽ — 10x? + x)]' = Ea? -- 1)? + 3χ7}}’ 

G5(x) = [τη -- 1)(24x* — 6x? + 1] 
and so forth. Thus, when m is odd, there follows G,,(x) = V,,(x), where V,, 
vanishes at the ends of the interval [—1, 1] and is of constant sign inside that 
interval, whereas, when m is even, there follows G,,(x) = V(x), where V,, and 
V,, vanish at the ends of the interval and V,, is of constant sign in the interior. 


It follows from (5.10.38), with n + 1 = m, that the error E,, associated 
with an m-point formula is given by 


x fo (m odd) 
(m + 1)! 


oan cL? 
(m + 2)! 


(8.14.25) 
(m even) 
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where 


roe [ Vin(X) dx = [ xG,,(x) dx (m odd) 
K τ" = 


" ee 


2 [ V(x) dx = [ x?G,(x) dx (m even) 
~1 -1 


The first six of these values are found to be K, = 2, K, = #, K3 = 7s, 
Κι = S45) Ks = 436 Ke = τοῦτ: 

In the case m = 2, formula (8.14.22) reduces to the Legendre-Gauss two- 
point formula. It may be noticed that the degree of precision is m when m is 
odd, but is m + 1 when m is even. More generally, whenever w(x) is an even 
function of x, it is apparent from the symmetry that both members of (8.14.1) 
will vanish when f(x) is any polynomial of odd degree (or any odd function of x). 
Hence, in such cases, if m is even and if the degree of precision is at least m, 
then it is also at least m + 1.7 

The Chebyshev-Gauss formula of Sec. 8.8, with w(x) = (1 — x’) 1/2 isa 
particularly notable member of the general class of formulas considered in 
this section, since in that case it was seen that the degree of precision attains its 
maximum value 2m — 1, for all m = 1, in spite of the fact that the weights 
are equal. It can be rederived here by noticing that (8.14.2) gives A = π, and 
hence W = x/m, in accordance with (8.8.11). Equation (8.14.17) then gives 


exp Ε [ w(t) log (x — ἡ a| 
-1 


μι [ Ιορ (x -- t) 
exp | -- -Ξ- ο΄ dt 
ἽΕ εἰ 1 -ο 


{| 
| * 
3] 3 
--΄-ο- 
— 
+ 
"Δ 
Ϊ 
δ i 
3 
ll 
ὃ 
Ξ 
> age Ξς 
— 
| 
& 
| 
to 
+ 
3 


m a 
- ee es el 2. μ..’ 


when x > 1, and the polynomial part of the last indicated expansion can be 
shown to be identical with To(x), when m = 0, and with the expanded form of 


2!-™T (x) = 21~™ cos (m cos” * x) 


when m = 1. 
+ The difference between this situation and that relevant to Newton-Cotes quadrature 
is a consequence of the fact that there the minimum degree of precision is m — 1, 
where m ordinates are used. Thus an increase of (at least) one degree occurs if 
m — 1 is even and hence m is odd. 
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8.15 Algebraic Derivations 


Any specific one of the quadrature formulas considered in this chapter can be 

obtained directly by purely algebraic methods, without the use of properties of 

orthogonal functions. In cases in which the weighting function is given empir- 

ically, or in which only a single specific formula is desired, such methods are 

often to be preferred. For this reason, they are discussed briefly in this section. 
We suppose here that the formula is to be of the form 


b m 
{ w(x) f(x) dx = > Wf(x, + E (8.15. 
a k=1 
where w(x) 2 Oin [a, δ], and that the abscissas and weights are to be chosen in 
such a way that the degree of precision is at least m — 1, so that E = 0 at least 
when f(x) = x” (r = 0,1,...,m — 1). If we define the rth moment M., 
associated with w(x) over [a, b], by the equation 


a 


b 
Ϊ x'w(x)dx = M, (r=0,1,2,...) (8.15.2) 


the requirement that the degree of precision of (8.15.1) be at least N is represented 
by the Ν + 1 conditions 


> Wxt=M, (r=0,1,...,N) (8.15.3) 
k=1 


Whereas these equations are linear in the m weights W,, they are nonlinear 
in the m abscissas x;, and the purpose of this section is to indicate in what way 
the difficulties associated with this nonlinearity can be minimized. 

The procedures to be used, in those situations in which no conditions are 
imposed on the weights, may be easily generalized from the simple case in which 
m = 2. Hence, in order to simplify the notation, we consider that case specif- 
ically, but describe the procedures in general terms. The simplest case, clearly, 
is that in which the m abscissas are preassigned. Then, unless they are chosen in 
a special way, we can require only that the degree of precision N be at least 
m — 1. When m = 2, the two conditions to be satisfied are then 


W,+W,=M, 


(8.15.4) 
Wx, + WX> — M, 


Since the abscissas are assigned, we have m simultaneous /inear equations in the 
m unknown weights, and it can be shown that these equations always possess 
a unique solution. 

On the other extreme, we have the gaussian case, in which no constraints 
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are imposed and in which the degree of precision is to be 2m — 1. In the case 
m = 2, the four conditions to be satisfied are then of the form 


W, + W, = Mo 
Wx, + W2x, = M, 
Wixi + Wx} 
W,xi + Wx} 


(8.15.5) 


2 
M; 


representing four equations in the four unknown quantities x,, x,, W,, and 
W,. In order to solve these equations, we let x, and x, be the zeros of z(x) 


mx) = (x — x(x — xX.) = x? +ayx+ta, (8.15.6) 


and attempt first to determine the coefficients αι and a. If we multiply the 
third equation of (8.15.5) by 1, the second by αι, and the first by a,, and add 
the results, making use of the fact that 


2 2 
χι + a,x, + a, = 0 Xp + αἰχη + a = 0 


we obtain the condition 
M, + Mo, + Moa, —_ 0 (8.15.7) 


Similarly, from the fourth, third, and second equations we obtain the 
requirement 
M, + M0, + Ma = 0 (8.15.8) 


The last two equations are linear in a, and a,. If Mj # MyM), they 
possess a unique solution. (This can be guaranteed when w 2 0 in [a, b].) 
The abscissas x, and x, are then determined as the roots of the algebraic equa- 
tion 2(x) = 0, provided that 2(x) has real roots, and the weights W, and W, 
are finally determined from any two (say, the first two) equations of (8.15.5). 

The generalization is obvious since, in the general case, πίχ) will be specified 
by m o’s and the 2m equations replacing (8.15.5) will provide m sets of m + 1 
successive equations, from each of which a Jinear equation in the αὐ may be 
obtained by the same general procedure as that which led to (8.15.7). These 
equations will (generally) determine the «’s, after which the abscissas are 
obtained as the roots of z(x) = 0 and, finally, the first m of the basic equations 
determine the weights. 

In the intermediate cases, in which, say, m — r of the m abscissas are 
preassigned, we can hope only for a degree of precision m + r — 1 (unless 
those abscissas are assigned in a special way), and hence there will be m +r 
basic equations replacing (8.15.5). If we again let x(x) denote the product 
(x — x,)(x — x,)++* (x — X,), involving the fixed abscissas as well as the free 
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ones, then z(x) again will be specified by m «’s. From the m + r basic equations, 
we can proceed as in the derivation of (8.15.7) r times, and hence can obtain r 
linear equations in the «’s. The m — r additional linear equations needed for 
the determination of the m a’s then follow from the requirements that the 
m — r fixed abscissas satisfy the equation n(x) = 0. 

Thus, in the case m = 2, r = 1, the three basic equations are 


W,+ W, = My 
WixX, + χη = M, (8.15.9) 
W,xi + W,x3 = M, 
Under the assumption that x, and x, satisfy the equation 
x? + ax +4, =0 (8.15.10) 


we again obtain (8.15.7). By combining this condition with the requirement 
that the preassigned value x, satisfy (8.15.10), we deduce that a, and a, are 
determined uniquely by the two linear equations 


M, + M,o, + Mou — 0 


: (8.15.11) 
X41 + XQ, + Xo = 


under the assumption that Myx, τ M,. 

There is no guarantee in this case, even though it be true that w(x) = 0, 
that the zeros of x(x) will be real and distinct or, if so, that they will lie in 
[a, b]. However, if a quadrature formula of the type sought exists, it can be 
obtained by the method outlined. 

As a simple illustrative example, we suppose that a quadrature formula 
is required to be of the form 


1 
Ϊ x"? (Χ) dx = για) + 7 + E (8.15.12) 
0 
where x. = 1 is preassigned. The expected degree of precision is then 2, 


corresponding to the fact that three free parameters x,, W,, and W, are available. 
We first calculate the relevant moments 


1 
M, = | xf4rt 1)}2 dy = 2 (r = 0,1, 2) 
0 2r + 3 


after which the three basic conditions (8.15.9) become 


WtW=3 Wxyt+W,=% Wyxit+W,=2 (8.15.13) 
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By writing x(x) = (x — x,)(« — 1) = x7 + a,x + a, we deduce from 
(8.15.11) that «, and «, must satisfy the equations 
$+ Za, + Za, = 0 
1 -+- αι + Xo - 0 
and obtain a, = —12, a, = 3, and hence n(x) = (7x? -- 10x + 3)/7. Thus 


there follows 
x4 = 4 X2 = l (8.15.14) 


W, = τε W, + (8.15.15) 
Thus (8.15.12) becomes 


[ x/2#(x) dx = τς (3) τ-ἮἯ Σ:.(ᾳ)͵, + E (8.15.16) 
0 


We verify that E = 0 for f(x) = 1, x, and x’, and find that E # 0 when 
f(x) = x>. Hence the degree of precision is indeed 2. 

In order to obtain an expression for the error term E, we may make use 
of one of the methods of Sec. 5.10. In particular, the influence function (5.10.16) 
with N = 2 is readily determined in the form 


— 8557/2 (0 Ξ 5 Ξ 3) 
G(s) = ; (8.15.17) 
—si,(16s7/? — 4952 + 428-9) (S581) 


and is found to be negative throughout the interior of the interval [0, 1]. 
Thus the formula (5.10.31) can be used to give 


Ε -59| Ϊ ἜΣ δ ας τινα, ae τ = εξ) (8.15.18) 


0 
where 0 < € < I. 
The same result can be obtained somewhat more easily by use of the third 
method described in Sec. 5.10. For we find that 


w(a)m(x) = [9x82 -- Ὁ 


where the constant of integration is determined so that the content of the brackets 
vanishes when x = 0. Since it vanishes also when x = 1 and is positive for all 
intermediate values of x, we may make use of (5.10.38), withn + 1 =m = 2. 
r = l,and V = 2x*/*(x — 1)’, to deduce that 


Ε- fo | ᾿ἰφχλιξ(α — 1)? dx = -- εξξτ 7" ὦ 
0 


as before. 
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Finally, in the general case of Chebyshev quadrature, in which all the 
weights are to be equal, the formula is of the form 


[ FO  Ὴ Σ fy) +E (8.15.19) 


and the m + 1 conditions requiring that the degree of precision be at least m are 
of the form 


XP xD τ x8 tees + x8 = Mo 
Xp Ἔ Χχ Ὁ χε +°°° + Xm = Μ, 
xP 4x3 t+ x3 +--+ 4+ x2 = M, (8.15.20) 


MtxFt xt += M, 
where we have written 


b 
M, = — M, = | x'w(x) dx (8.15.21) 


From the first equation, there follows immediately M, = m, and hence 
1 (ἢ 1 
W=—| w(x)dx =—M, (8.15.22) 
m J, m 
Under the assumption that the problem possesses a (real) solution,+ we again 
write 
M(x) = ( — χει) — χ)ϑ)ϑοὸ83"’ τ — Xm) 
= XH yx gx? ee tg X + Om (8.15.23) 
and attempt to determine the m coefficients αι, ... , Om- 

First, by multiplying the first equation in (8.15.20) by «,,, the second by 
Om—1,--+, the next-to-last by «,, and the last by 1, adding the results, and using 
the fact that each x; satisfies x(x) = 0, we obtain one liner equation relating 
the «’s in the form 

Mom + My%m—1 + Mom. τ Ὁ My ie, + M, = 0 (8.15.24) 


In order to obtain m — 1 complementary relations, we make use of Newton’s 
power-sum identities [see Theorem 13 of Sec. 1.9, with s,, a,, and n replaced by 
M,, %,, and m] to deduce that 


ro, + M,o,_, + Moo,_, +°::+M,_,a,+M,=0 (r=1,2,...,m) 
(8.15.25) 


} As was pointed out in Sec. 8.14, this assumption is not always valid. 
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This recurrence formula, which includes (8.15.24) when r = m, permits the 
expression of each of the «’s in terms of the reduced moments M,, M2,..., 
and M,,. The required abscissas then are the roots of the equation 


X™ xD gx He A Oy 1X + Oy =O (8.15.26) 


if those roots are real and distinct. Otherwise, the desired formula does not exist. 
In illustration, in order to determine the Chebyshev abscissas for the 
quadrature formula 


1 
[.« = 200 dx = WLC) + Fn) + feed] +E 6.15.27) 
we first calculate the common weight 
1 
w=4| ( — x*)dx = 
-1 


and then the relevant reduced moments 


| 
ol] 


1 1 
M, - 2 | x(1 — x?) dx = 0 M, = % | x*(1 — x*)dx =2 
1 1 


| 
Θ 


1 
Μ.: = | χ  -- x*)dx = 
=i 


Next, from (8.15.25) with r = 1, 2, and 3, there follows 


a, = -Μ, -ςὸο ὸῦ 1% ΞΞ 4(— Mo, = M>) = —7o 


Hence, the required abscissas then are obtained as the roots of the equation 
x? — 3.x = 0, in the form Ν 
ΧῚ = aa) 3: X4 = 0 X3 = τὸἪἢ (8.15.28) 


so that the desired formula is 
Ϊ "a -- ΧΟ) dx = Ξ{0-- 8) + ΧΟ) + SWE] + E (8.15.29) 


It is easily verified that E = 0 when f(x) = 1, x, x’, and x°, but that E # 0 
when f(x) = x*, so that the degree of precision is 3. 

An expression for the error term is obtained most readily by use of the 
third method of Sec. 5.10. Thus, we find that 


(t? — ~5t)(1 — ε2) dt 
-1 


| * ww(t)m(t) dt 


—<h5(20x° — 39x* + 18x? + 1) 
--«:π(Χ2 -- 1)7(20x? - 1) 


Ι 
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so that w(x)n(x) = Κ΄ (ἡ), where V(x) = —7(x? — 1)?(20x? + 1). Since 
V(x) vanishes at both ends of the interval of integration and is of constant sign 
inside that interval, use may be made of (5.10.38), with n + 1 = m = 3 and 
r = 1, to give 


ἐς fO(' 
4! 


V(x) dx 


or 
E = za f'(6) (8.15.30) 


where |¢| < 1. The same result can be obtained, somewhat more laboriously, 
by use of the appropriate influence function (5.10.16) with N = 3. A check 
is afforded by an application of the formula to f(x) = x*. 

In applications of the methods of this section, when m is large and the 
relevant set of m linear equations determining the coefficients a,,..., 0, 
which specify x(x) is to be solved by numerical methods, proper account must 
be taken of the fact that small errors in the «’s can propagate into large errors 
in the finally calculated zeros of x(x) when x(x) is an ill-conditioned polynomial 
(see Sec. 10.5). 


8.16 Application to Trigonometric Integrals 


To conclude this chapter, we illustrate the use of the methods of the preceding 
section in connection with the derivation of formulas for the approximate 
evaluation of the integrals 


S, = Ϊ Ε(θ) sin Κθ dé C, = Ϊ F(@) cos Κθ ἀθ (8.16.1) 
0 
when k is a positive integer. 

In the case of δ, if we consider sin k@ as a weighting function w on the 
interval [0, x], we notice that w changes sign inside that interval unless k = 1 
and, in fact, that w oscillates rather violently if k is large. To deal with this 
oscillation, we may divide [0, z] into the k subintervals [0, x/k], [x/k, 2π|Κ], 
ἐν [π — 2/k, x], in each of which sin k@ is of constant sign, and derive a 
gaussian formula for each subinterval. Thus we express δὲ in the form 


k-1 (n+ 1)2/k 


5. Ξ > ἢ Ε(Θ) sin Κθ 4θ 


πμξξ0 


κ-- = 
(—1)" | F (Ξε: *) sinx dx (8.16.2) 
n=0 0 
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with the substitution x = k@ — nz, and we are led to consider the approxima- 
tion of the integral 


Ϊ Πα) βίπχ ἀχ (8.16.3) 
0 


by gaussian quadrature, with the nonnegative weighting function w = sin x. 
For simplicity, we restrict attention here to the use of only two abscissas 
in this approximation. In this case (see Prob. 51) the methods of the preceding 
section give 
W,=W,=1 (8.16.4) 


and require that x, and x, be the zeros of the polynomial 
n(x) = x7 — mx +2 (8.16.5) 


so that, with a convenient notation, there follows 


π 3π 
x4 4 X2 4 ( ) 
where 


α = (5 De 5) = 0.1017308 (8.16.7) 
π 


Also, according to (8.3.6), the relevant error term is 
5 =f) (x? — mx + 2)* sin x dx 
24 Jo 


or 
E=K,f"€) (8.16.8) 
where 


_ 2 
ee —* « 0,022 (8.16.9) 


and where 0 < ἔ < π, so that the desired approximation to (8.16.3) takes the 
form 


[παν -ὐξξ +2) +4(F-a) + KO 16.0 


Hence, if this result is used for each of the k integrals in (8.16.2), there 
follows 


: ; 1 τ᾿ n nz + 7/4+ a 
| mes δε, LF ( : ) 


0 
+ de] +E (8.16.11) 
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where 
k-1 ; ἔ 
E= K,k-° ys (—1)"F (2: (8.16.12) 
n=O 


and where nz < ἔ, < (n + 1)z. 

The result of ignoring the error term in (8.16.11) accordingly yields 
an approximation to the integral S, with a degree of precision equal to 3. It 
uses 2k values of F (two per half-period of sin k@) and has an error term 
which is comparable (on the average, about twice as large) with the error term 
associated with a similar (but nongaussian) formula due to Price [1960], which 
employs 3k + 2 values of F at equally spaced points. 

In the case of Οἱ, a corresponding subdivision of [0, 2], defined now by the 
zeros of cos k@, would yield the k + 1 subintervals [0, 2/2k], [/2k, 3x/2k], 
...,[% — π|2Κ, x], and the two end subintervals would require separate 
treatment. However, it happens that a piecewise-gaussian approximation to 
C, can be obtained with the same uniform subdivision as that used for S, in 
spite of the sign change of the weighting function cos k@ at the midpoint of 
each subinterval. In fact, the approximation so obtained is found to possess 
a remarkable additional property. 

Accordingly, we first write Οἱ in a form analogous to (8.16.2) 


k-1 7 
(—1)" | r(=2*) cosx dx (8.16.13) 
n=0 


0 


C, = 


and seek a two-point gaussian formula for the integral 


[ fe cosx dx (8.16.14) 
0 


with no assurance of success because of the sign change in w = cos x at 
x = π[2. The relevant equations (8.15.5) are found to yield the polynomial 


m(x) = x7 — mx +6 —4n? (8.16.15) 


whose zeros are to be the required abscissas, and there follows 


7 
δες Bh By es (8.16.16) 
8 8 
where 
3π2 — 
p= a - = = 0.0060494 (8.16.17) 


Thus we are fortunate in that x, and x,, not only are real, but also lie in [0, x]. 


430 INTRODUCTION TO NUMERICAL ANALYSIS 


The associated weights are of equal magnitude and opposite sign (as might have 
been anticipated in view of the antisymmetry of the weighting function): 


I 


W 2, SS ee 
32/8 + B 


= 0.8444900 (8.16.18) 
Finally, the error term takes the form 


ἘΠ:ΞΞ Ϊ [π(Χ})]2 cos x f[x1, Χι. X2, χ x] dx (8.16.19) 
0 


and, since cos x changes sign at x = 2/2, the second law of the mean is not 
applicable. However, if we write 


[x(x)]? cos x = V'(x) V(x) = [ [π(5)}]2 cossds (8.16.20) 


in preparation for an integration by parts, we may notice that since V’(x) 2 0 
when 0 < x S 2/2 and V'(x) < 0 when π|2 S x S π, V(x) increases steadily 
with increasing x from zero at x = 0 to a positive maximum at x = 2/2 and 
then decreases steadily as x continues to increase from 7/2 to x. But since clearly 
V'(xn — x) = —V’'(x), the total decrease in V(x) as x varies from π|2 to x 
is exactly equal to its increase as x varies from 0 to 7/2. Hence we conclude that | 
the function V(x) defined by (8.16.20) is positive for Ὁ < x < x and zero for 
x = Oandx = 1. 
Consequently, an integration by parts transforms (8.16.19) to the form 


E=- | V(x) f Lx, X15 X2, X2, X; x | dx 
0 
and hence, by use of the second law of the mean, to the form 
E= εὐ 0 V(x) dx (8.16.21) 
120 Jo 


where 0 < ἔ < x. Finally, the computation of the integral in (8.16.21) can be 
avoided, as in the transition from (5.10.30) to (5.10.31), by noticing that it must 
be equal to — E when f(x) = x. In this way we find that 


E = ---κ, ὦ (8.16.22) 
where 


336 — 24n? — π 


4 
K, = 0.0072 (8.16.23) 
240 


and where 0 < ἔ < 7. 
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Thus the two-point formula 


: πος 9] π p\_ (ie _ Kf 
[ (x) 008 x dx = = ἢ ( ) r( = + a) κι ὃ 
(8.16.24) 


has in fact a degree of precision of 4, one greater than the maximum for a two- 
point gaussian formula with a positive weighting function; and the result of 
applying (8.16.24) to each integral in (8.16.13) is the formula 


; ~ tS lay [pe (2 πίβ = 8 
[ F(6) cos Κθ 40 = Cera B 2,‘ 1) LF ( : ) 


_ F (come) +E (8.16.25) 


where 
k-1 E 
E = —K,k~° > (—1)"F’ (2 (8.16.26) 
n=0 


and where nz < €, « (n + 1)π. 
In the special case when 


F(0) = e (8.16.27) 


the integrals S, and C, are elementary and also the sums in (8.16.11) and 
(8.16.25) can be expressed in closed form to give 


k 
a’ + k? 
_ cosh [(π|4 — a)a/k] (1 
k cosh na/2k 


(1 — e cos kn) 


Sk = 


— e"coskn) + E (8.16.28) 


where an expansion shows that the relative error is such that 


Ε Κ, [α΄ a® K 
— Sate) aoe —1 = 0.011 16. 
7a (k) + (is) (Goer) esas 


and 
C = ὦ 1 an k 
k= Para age — e” cos kn) 
sinh [(32/8 + B)a/k] 
te I θεν E (8.16. 
k(3n/8 + B) cosh πα ες Sorat (8.16.30) 
where 


Ε Καὶ, (αλ' αϑ Κ 
Se fe ep OT — = 0.0036 8.16.31 
«τ () +9) (=o) essay 
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Thus here the relative error becomes small of the order of (a/k)* in both cases 
when |k/a| is large. For example, when |k/a] = 10, the approximation to S, 15 
in error by less than two units in the fifth significant figure and the approximation 
to C, by less than four units in the sixth significant figure. 

The preceding developments generalize without difficulty to the application 
of piecewise-gaussian quadrature employing more than two ordinates per half- 
period of the trigonometric weighting function. The use of a formula of Lobatto 
type over each half-period would afford obvious computational advantages. 
When the sum defined in (8.16.11) or (8.16.25), or in a generalization, involves 
a large number of terms with alternating signs, an accelerating process such as 
Euler’s transformation (Sec. 5.9) may facilitate its evaluation. 


8.17 Supplementary References 


General references on numerical integration, dealing with methods of this 
chapter as well as with many others, include Krylov [1962], Davis and 
Rabinowitz [1967], and Ghizzetti and Ossicini [1970]. Davis and Rabinowitz 
include an extensive bibliography, together with a selection of Fortran programs, 
a list of Algol procedures, and a partial list of existent numerical tables, all 
relevant to numerical integration. Stroud [1961] presents an exhaustive 
bibliography of sources through 1960. 

Classical references on gaussian (and gaussian-type) quadrature include 
Mehler [1864], Chebyshev [1874], and Radau [1880a, 1880b, 1883]. More 
recent contributions are by Shohat and Winston [1934], Winston [1934], and 
Bernstein [1937]. Stroud and Secrest [1966] provide a comprehensive collection 
of formulas of gaussian type, together with extensive tables of abscissas and 
weights for many of the formulas, including all the standard ones. Shorter 
tables are included in the NBS handbook [1964] and in Krylov [1962]. Other 
tables are listed in the index by Fletcher et al. [1962]. 

The relationship between gaussian quadrature and osculating (Hermite) 
quadrature is pointed out by Fort [1948]. Salzer [1954] presents a table of 
coefficients for osculating quadrature. Stieltjes [1895] related gaussian quadra- 
ture to continued fractions (see Cheney [1966], pp. 186-188), while Ghizzetti 
and Ossicini [1970] employ a quite different approach due to Radon [1935]. 
For methods of algebraic derivation, see Beard [1947] and Hamming [1962]. 

Convergence of gaussian-type quadrature sequences was established by 
Stieltjes [1884] on a finite interval, when f(x) is continuous, and by Uspensky 
[1928] in special cases involving infinite intervals. For a stronger result when the 
interval is finite, see Davis and Rabinowitz [1967]. 
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PROBLEMS 
Section 8.2 


1 Obtain the formula 
f, = (1 + 26)Ὰ -- 5)7f + (3 — 2s)s7f, + s(1 — 5)7hfi 
4 
~ ρα — sf + τα — s)? 


where ἡ = f(Xo + hs) and x9 < ἔξ < x9 + ἢ, if OS s S 1, and deduce the 
special formula 


h μ᾿, 
= + ΒΒ pi f / + cee lv 
fin = 400 + fi) 3 (fo — fi) ΤΡ ἢ (¢) 
2 Obtain the formula 
fe = 44 + 35)5 Ὁ — s)?f_, + (1 — 57)*f + 44 — 35)50 + 5)2h 
+ 26 + »)( — s)7hf', + σα — s?)?Afl 
6 
- 40 = δα + PAP, + S /M@s2C -- 92) 
where f, = f(xo + hs) and x9 —h < ξ < x9 + ἢ, if |s| S 1, and deduce the 
special formula 
fiya = treQf_, + 720 + 451) 


+ pe, 4 126 — 3.0 +4 pe 
128 ~* : "5120 


together with a corresponding formula for f_ 1/2" 
3 From the following tabular values of the function 


Six) = [Ξ dt 
0 


determine approximate values of Si(2.5) and Si(3.5) by use of the formulas of 
Probs. 1 and 2: 


x 2.0 3.0 4.0 
Si(x) 1.605 1.849 1.758 


Section 8.3 


4 From the results of Prob. 1, deduce the formulas 


“fede =4h+ H+" HR - 19+ HpreO 
ig "ον ἑν Fae 5), 
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and 
xy h? he h® ; 
[ Ge = x(a) ἀκ = 55 fo + Th) + Cfo -- HD + EI 


where h = x, — Xo and Xp < € < x, in each formula. 
From the results of Prob. 2, deduce the formula 


Χι h . h? h’ ; 
f(x) dx = —(1f_-1 + 16f + 71) + —U11 - fi) + — Sf" 
[- ) ἀκ = Af, + 16h + Th) + τς τὰ - 1 + FIMO 
where X; = Xo +h= X_1 + 2h and X14 < c « χ:. 
Use the data given in Prob. 3 and the first formula of Prob. 4 to obtain approx- 
imate values of the integral [ἢ Si(x) dx for [a, δ] = [2, 3] and [3, 4], and com- 
pare the sum with the result given by the formula of Prob. 5. 


Section 8.4 


7 


8 


By specializing f(x) to /,(x) in (8.4.6), prove that H; = W; for a gaussian quadra- 
ture formula. 
With the notation of Sec. 8.4, suppose that ¢(x) satisfies the recurrence formula 


bia (x) = (aux + dy) φκ(χ)ὺ + Ce Px—1@) (kK 2 2) 
with $o(x) = Ag and ¢,(x) = Ayx + B,. If 4x) is defined by the relation 


6,(x) as [ w(y) oy) oe 9 (x) dy 
a yx 


show that 6,(x) satisfies the same recurrence formula 
Ox 4.1(0) = (a,x + δ θμίχ) + CO, — 1%) (k 2 2) 


with the modified starting values 09(x) = 0 and 0,(x) = 44γ0ο. Show also that 
the gaussian weight H;, defined by (8.4.16), can be expressed in the form 


i On Xi) 
bh (xi) 


[In the expression for 6,4 1(x), replace ¢,4, by the appropriate combination of 
φι and ¢,—1, write a,y = a,(y — x) + a,x, and recall that [δ wd, dx = 0 when 
k > 0. Notice that if ¢,(x) is replaced by its monic multiple d,(x), as is con- 
venient when the relevant orthogonal polynomials are generated recursively 
(Sec. 7.10), then a, is to be replaced by 1 in both of the above recurrence formulas. ] 
If [a, b] = [0, 1] and w(x) = x, show that the abscissas in (8.4.6) are the zeros 
of the polynomial 


i x} an 


oe ee Ie: μι Ὁ 11 m 
ai a ard 


P(x) = 
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that the ith weight is given by 
2m+ 1 


ἜΓΙΞΕΞΣ εὐ ε ξξ Ξ ΞΘ ἐκ σιαι, ἐς 
mm + 1)? 64,(%)) bm καρ 


and that the error term is of the form 


_— m+i1 (m!)* (2m) 
7 2(2m + 1)? ΤΙΝ (ὦ) 


In particular, obtain the formulas 


1 
Ϊ xf(x) dx = 32 + AF" 


0 


and 
ἢ .9- ν6 (6- νόλ 9.-.ν6 .(6-- V6 1 ay 
[9 4 = So 10 )+ 36 γί 10 ) + raat ©) 


10 Rederive the abscissas and weights for the quadrature formulas obtained in Prob. 9 


when m = 1 and when m = 2 by use of the methods of Sec. 7.10 and of Prob. 8. 


Section 8.5 


1 


12 


13 


14 


Rederive the Legendre-Gauss two- and three-point formulas by use of the methods 
of Sec. 7.10 and of Prob. 8. 
Use a Legendre-Gauss three-point formula to show that 


1 
| οὐχ dy = 2(4 + 5e73@7/5) 4 FE 


1 
where 
E = ae °° H. (a6) ([0] < 1) 


and where H, is a Hermite polynomial. 
After making an appropriate linear change of variables, determine approximate 
values of the integral 

8 dx 

2 Xx 


by use of gaussian formulas involving two, three, four, and five ordinates, and 
compare the approximations with those afforded by corresponding Newton- 
Cotes formulas. In each case, obtain an upper bound on the error analytically 
and verify that it is conservative. 

Proceed as in Prob. 13 with the integral 
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15 Proceed as in Prob. 13 with the integral 


+4 
[ sin x dx 
0 


4 dx 
_4il + x? 


16 Obtain approximations to the integral 


using gaussian quadratures with two, three, four, and five points, and compare the 
results with the true value (see also Sec. 3.9). 


Section 8.6 


17 Use a Laguerre-Gauss two-point formula to show that 


| 6 ατ3 εὖ 


oer ae  Π ἢ 
9 x ta α2 41: 2 a 


(0 «θ «1) 


when a > 0. 
18 Determine approximate values of the integral 


οΌ 
| e *sin x dx 
0 


by use of Laguerre-Gauss quadratures employing two, three, four, and five 
ordinates. In each case, obtain an upper bound on the error and verify that it is 
conservative. 

19 Proceed as in Prob. 18 with the integra! 


οΌ ο 
| dx = 0.206346 
o x +4 


20 Proceed as in Prob. 18 with respect to the integral 


oO x! [2 
6" dx = 0.16776 
o x+4 
omitting the analytical determination of error bounds. 
2] Derive the results of (8.6.17) to (8.6.19), and obtain the special formula 
rd + 


[, xPe*f(x) dx = a (2+ B-V2+ BSQ+ B+V2+ β) 


0 2(2 + 
ΚΕΤΞΞΤΙΌΡ --.----.- Γ( 
42+ Btv2+ BS2+ B- V2 + B)] + ree @ 


where B > —1 and € > 0, in the case when m = 2. Also use the two-point 
formula to approximate the integral in Prob. 20, and compare the result with that 
afforded by the two-point Laguerre-Gauss formula. 
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Section 8.7 


22 


Determine approximate values of the integral 


οΌ 5 han. 
e~* cos x dx = ν᾿ πε" 
-- 
by use οἵ Hermite-Gauss quadratures employing two, three, four, and five 
ordinates. In each case, obtain an upper bound on the error and verify that it is 
conservative. 


23 Transform the integral of Prob. 20 to the form 
oo 2 
ad ε "ἢ at 
«οἱ 4 
and determine approximate values by use οἵ Hermite-Gauss quadratures em- 
ploying two, three, four, and five ordinates. Also compare the results with the 
corresponding results in Prob. 20. 
24 From the following tabulated rounded values of Jo(x), together with the fact 
that Jo(—x) = Jo(x), determine approximate values of the integral 
οο 
| e~*’Jo(x) dx = 1.570301 
— 00 
by use of Hermite-Gauss quadrature employing two, three, four, and five ordinates: 
x 0.0 0.5 1.0 1.5 2.0 2.5 
ne een SOO fe τ οα΄ τς ν΄ τ’ ον - τ - 
Jo(x) 1.000000 0.938470 0.765198 0.511828 0.223891 — 0.048384 
Section 8.8 
25 Use a Chebyshev-Gauss three-point formula to show that 
1 2 6 
see i 2 cosh V3 +— (2) 2 (μ]) < 1) 
LFA) Woes ἊΣ 3 2 360 \2 
26 Determine approximate values of the integral 
1 
OSX de = mJo(1) = 2.40394 
~1V1 — x’ 
by use of Chebyshev-Gauss quadratures employing two, three, four, and five 
ordinates. In each case, obtain an upper bound on the error and verify that it is 
conservative. 
27 Determine approximate values of the integral 


1 
-1V(1 — x?)(16 — x?) 
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by use of Chebyshev-Gauss quadratures employing two, three, four, and five 
ordinates. 


28 Use the results of Prob. 33 of Chap. 7 to deduce the quadrature formula 


1 eae m 
1..: 2 dy = π . 2 kn ἀπ 
[¥ x* f(x) dx Gia (SG) 1 (e085) 
π fm) 


gam+i (2m)! 
29 Proceed as in Prob. 26, using the formula of Prob. 28 to deal with the integral 


(ἰξ] < 1) 


1 
| ΝΠ — x? cos x dx = 1.38246 
-1 


Section 8.9 


30 Determine, to six decimal places, the abscissas and weights in a formula 
1 
| (1 + x)" f(x) de = Hy fey) + Heft.) + E 
-1 


with degree of precision equal to 3, and obtain an expression for the error in 
terms of f'”. Also transform the results into a formula of the form 


1 
Ϊ xV/2F(x) dx = H{F(x}) + Η2ΖΕ(Χ2) + E' 
0 


31 By making appropriate use of (8.9.12), obtain the quadrature formula 
1 ᾿ m 
| ᾳ -- αἾὟγ0) de = FHS) +E 
-1 = 


when ἡ is a nonnegative integer, with x; the ith zero of the polynomial 


by(x) = P(x) =H Pl) 
dx 
and with 
7 2(m + 2n)! 
mt (1 — x?) [PEEP OD? 
and 


; | 
fOPE) el < 1) 


2m + 2n+ 1 


_m!(m + 2n)! [Gm + η)} 1’ 2657)’ 
(2m)! (2m + 2n)! 


Section 8.10 


32 Suppose that a quadrature formula of the form 


Ϊ “δ ΓΟ) dx = Κλ ΚΟ) + > We) + E 
} 2 
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is required, with the abscissa x, = O assigned. Show that the m — 1 free abscissas 
should then be zeros of the polynomial ¢,,_ ,(x), where 


r 
d(x) = Cx ter © fart te-¥] 
dx" 


I 
CY 
u 
ἐν 
4 


rox ἀπο ν δίυυς 
τι (δ ce er: (x"e | 


if the degree of precision is to be maximized. Verify the relation 


ε΄ 1 


x d 
—— (x"e"™*) = —--e *— Lx 
dx" ( r dx I 
and hence, by taking C, = 1, deduce that the free abscissas must be zeros of 


Pm—1(X) = Em—1(*) = Lim—1(X) 


where the prime denotes differentiation. 
33 For the set of polynomials ¢,(x) obtained in Prob. 32, show that 


A =(-1I¥ F%=rlr 4D! 


and hence deduce that the weights associated with the free abscissas are given by 


(m — 1)! m! . 
ι-Ξ---.-.ΘΞΘ.ΘΘ.-.ἙΦὦἙἜ (i # 1) 
Xi Pn —1(%i) Om(Xi) 
Show also that 
-- 1 Ὁ 4 1 
ee — je 1,..--.(Χ}] dx = — 
"350 ). ax ie m 


34 Show that the error term in the formula of Prob. 32 can be obtained by the 
following steps: 


E=(-1)""! [ C"XPm—1(x)F [1,-- 5 Xm χ͵] dx 
0 


ἐχπ 1 


m—1 
= [ md 7 fix, +09 Xmy Ἢ χῆρτνχ ἄχ 
0 


= (-1)"" [ Peas Xm x] A (ποτ) ἀν 


2 ἀπ τ fem) [ “ χπρτα dx [see (5.10.36) 


= ΞΕ), E>) 


Also verify that the same result follows directly from (8.10.22). 
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35 Show that the results of Probs. 32 to 34 reduce to the formulas 


Ϊ ” e-*f(x) dx = 41f0) + £2] + 4" 
oO 


and 


£f(0) + [ἃ + ν3),9 — V3) 


Ϊ ᾿ e *f(x) dx 
0 


+ (2 — ν3),,8 + V3)] + τῷ 


when m = 2 and 3, where € > 0 in both cases. 
36 Use the second formula of Prob. 35 to approximate the integral 


CO 
Ϊ e *~sin x dx 
0 


and compare the result with that of using two (nonzero) ordinates in Prob. 18. 


Section 8.11 


37 Obtain approximate values of the integral 


0 


πί2 
{ sin x dx 


by use of Radau quadratures employing two, three, four, and five ordinates, 
taking the vanishing ordinate as the assigned one. Also compare the results with 
corresponding ones (employing one, two, three, and four nonvanishing ordinates) 
in Prob. 14. 

38 Proceed as in Prob. 37 with the integral 


1 
Ϊ x cos x dx 
0 


and compare the results when m = 2 and 3 with those given by the two explicit 
formulas of Prob. 9 (in which x is to be considered as a weighting function). 


Section 8.12 


39 Obtain approximate values of the integral 


π 
[ sin x dx 
0 


by use of Lobatto quadratures employing three, four, five, and six ordinates. Also 
compare the results with corresponding ones employing one, two, three, and four 
nonvanishing ordinates) in Prob. 15. 
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40 Proceed as in Prob. 39 with the integral 


4] 


1 
{ (1 — x”) cos x dx 
-1 


and compare the results when m = 3 and 4 with corresponding ones obtained by 
using (8.9.13) and employing like numbers (one and two) of nonvanishing 
ordinates. 

Derive the formula 


Ϊ > ee ΕΣ + 1p) + ἽΣ (eos 5] 
2 2 ΚΞῚ m 


π 
-1V1— x? m 


21 


(2m) 
= "ami 2m(€) 


making use of Prob. 33 of Chap. 7 and of Eq. (7.9.21), and noticing that m + 1 
ordinates are employed. 


Section 8.13 


42 Determine Legendre-Gauss approximations to the integral 


43 


4 ax 
| -1 x? + ΗΕ: 

for R = 4, 2, 1, and 4. In each case obtain a sequence of four approximations, 
using successively m = 2, 3, 4, and 5 points, and calculate each error E,,(R). 
Then, for each value of R, determine the successive ratios |E,,.1:(R)/E,,(R)| and 
compare the results with the asymptotic value 1/(4R?) predicted by (8.13.7) as 
m-» © (when R > 4). (Rounded true values: 0.1224893, 0.4636476, 1.5707963, 
4.4285949.) 

Proceed as in Prob. 42, obtaining sequences of Hermite-Gauss approximations to 


the integral 
© e-* dx 
[. x* + R? 


(Rounded true values: 0.1075993, 0.4011746, 1.3432934, 3.8684965.) 


Section 8.14 


44 


45 


Rework Prob. 13 by use of Chebyshev quadratures employing two, three, four, 
and five ordinates, and compare the results with the results of that problem. 

Suppose that independent errors ¢,, €5,..., &_ are associated with the m ordinates 
used in a Chebyshev quadrature over [—1, 1], and that each of these errors is 


442 


46 


47 


48 
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distributed about a zero mean with RMS deviation not exceeding egy. If Καὶ is 
the corresponding error in the approximate integral, show that 


2 
[ΚΙ max = 2\é| max Reus = Te ERMS 


m 
Show also that the first relation holds for Legendre-Gauss quadrature while the 


factor a/v m in the second relation is increased by only about 5 percent when 
m = 3, 4, and 5 (the same is true for m = 6, 7, and 9) but that somewhat larger 
increases in this factor occur when corresponding Newton-Cotes formulas are 
used over [—1, 1]. Also determine the values of this factor associated with the 
trapezoidal and parabolic rules when m = 3, 5, 7, and 9. 

Verify that the interval [—1, 1] can be replaced by [a, δ] in (8.14.1) and (8.14.17), 
when [a, δ] 4 (-- οὐ, 00), and, in the Laguerre-Chebyshev case when [a, δ] = 
[0, 00) and w(x) = e~~*, show that the m relevant abscissas are to be the zeros of 
the polynomial part of the formal expansion 


x™exp |—m a ea ee ae 
x x* x? χί 


= x™ -- mx™ 1. 5 (m on) 5 ae τ (mn -- 6m + 12)x™-3 
Ἢ oA (m3 — 12m? + 60m — 144)x™-4 + 


if those zeros are real. Show further that, when m = 2, the quadrature formula 
is identical with the first formula of Prob. 35, and also that two of the abscissas 
are nonreal when m = 3, so that no three-point formula of the required type can 
exist. [The same has been shown to be true for all m 2 3 (see Krylov [1958]).] 
Assuming the validity of (8.14.17) with [—1, 1] replaced by (— ©, οὐ) in the 
Hermite-Chebyshev case when w(x) = e~** (a modified derivation is necessary), 
show that the relevant abscissas are to be the zeros of the polynomial part of the 
formal expansion 


1 3 
x" exp ] -- ]-- τ ---- Ἐ --- 
P| Fre 16x* )| 


- χη. My + “om -- 6)γχπ 4 -- 
4 32 


that the zeros are real when m = 2 and m = 3, but that two of the zeros are 
nonreal when m = 4and m = 5. [The presence of nonreal zeros has been estab- 
lished for all m 2 4 (see Krylov [1958 ]).] 

Show that the iwoxpoint formula of Prob. 47 is identical with the Hermite-Gauss 
two-point formula, and that the three-point formula is of the form 


[« e-*4f(x) d = [s (- Υἢ ἐλ. 7.3] +E 
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Show also that the error term can be expressed in the form 
si 2 
E= | e~* (x? δ $x) f [x1, X2, ΧΆ, x] dx 
— 0 
and transformed by integration by parts to give 
Vn 


E = 4 | ᾿ εἶ τ Ἐ}1χ|, x2, X3, x, x] dx = τι ἢ νῷ 


for some value of ἔ. 


Section 8.15 


49 


50 


51 


Determine algebraically the unknown abscissas and/or weights for the formula 
1 3 
Ϊ f(x) de = > ταῦ + E 
-1 = 


subject to the requirement that the degree of precision be as high as possible in 
consistency with each of the following sets of constraints, and determine the 
degree of precision in each case: 


(a) x1 = —4, X2 τὶ 0, x3 = } 

(6) No constraints 

(ὦ W, = W, = W, 

(4) x, = -1 

Suppose that the abscissas x; = —1 and x, = ἃ are assigned and that the 
quadrature formula 


1 
| f(x) dx = W,f(-1) + Wef(@) + ΑΘ Ε (-1 <a) 
ae | 


is to possess a degree of precision of at least 3. Determine x3 and the three weights 
as functions of «, by algebraic methods, showing, in particular, that no such 
formula exists if « = 3, that x3 is outside [-- 1, 17 for all other « such that 


0 «α « a and that the ordinate at x = —1 is not involved if a = +1 ἾΝΙ 3. 
Show that the two-point gaussian quadrature formula of the form 


Ϊ " f(x) sin nx ἀχ = W, f(x) + 72) + E 
0 


where 7 is a nonnegative integer, is such that 


eee |) ae ς ἐπ Jf) +2 
A. 2 n2 22 2 2 
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52 


53 


54 


when n is even, and such that 


1 2 2 n2 2 2 n2 


when n is odd. Show also that the degree of precision is 4 when n is even and 3 
when 7 is odd. 

Show that the error term in the quadrature formula of Prob. 51 can be expressed 
in the form 


E= [ποὺ χα, χἸπ(χ) sin nx dx = ἢ [, foro sin nx dx (0 « 7 <2) 
0 0 


where x(x) = x2 — mx + 6n~? when n is even and x(x) = x? — ax + 2n™? 
when n is odd, and that, when z = 1, this expression can be transformed to . 


π ae . 
E=*f"© | n(x) sin χ ἀκ - >= "QO < E< x) 
0 
[Notice that here w(x) = sin nx changes sign inside the range of integration when 


n> 1.] 
Derive the gaussian integration formulas 


1 
Ϊ f(x) log x dx = - Ὁ + E 
0 
and 
1 
{ f(x) log x dx = —W,f (04) — Πρ ΟΣ) + E 
re) 
where 
i= 15 — 106 = 0.112009 χὰ = 15 + V106 = 0.602277 
42 42 
and 
W, = 212 + 9V106 . 9 748539 W, = 212 ~ 9V106 . 9 »81461 
424 424 


Show that the error terms associated with the two formulas of Prob. 53 are of 
the forms 


1 
B= [6 - Οὐ, sllogxde = τοῖν τῷ 
0 
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and 


1 * 
E= [ (x? — 4x + aes)7f [x1, X15 Xz, X2, x) log x dx = —-xsz$$to0f(E) 
0 
— —0.00012f'"(é) 


respectively, where 0 < ἔ < 1 in each case. 


Section 8.16 


55 Derive the relations (8.16.15) to (8.16.18). 
56 Derive the results of (8.16.28) and (8.16.30). 
57 Use (8.16.25) to approximate the integral 


(kK = 0, 2, 4,...) 


[, sin @cos ko do = 2 


1 — k? 


fork = 2, 4, and 6, and verify that the relative error in each case is approximately 
K,/(2k*) = 0.0036/k*. 


9 


APPROXIMATIONS OF VARIOUS TYPES 


9.1 Introduction 


Whereas polynomials usually are convenient coordinate functions for the 
approximation of a continuous function (or for least-squares approximation of 
a function which is continuous except for finite ‘“‘jumps’’) when the desired 
interval of approximation is finite, they are well adapted to the approximation of 
periodic functions only over relatively short ranges. When f(x) is periodic and 
is to be approximated over one or more complete periods, it is desirable to make 
use of periodic coordinate functions, having the same period as f(x), in con- 
structing its approximation. The most convenient set of such functions (which, 
indeed, satisfies all the requirements of Sec. 1.2 when f is also continuous) is 
the composite set of all sines and cosines which possess that period. Although 
formulas analogous to Lagrange’s formula exist for the determination of such 
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an approximation (see Chap. 3, Prob. 7), they are seldom used, and resort is 
usually had to least-squares methods. The relevant analysis, due originally to 
Fourier and often known as harmonic analysis, is presented and illustrated for 
continuous domains in Sec. 9.2 and for discrete domains in Sec. 9.3. 

When empirical data correspond to a simple decay or growth process, 
or to a combination of such processes, and an approximation is desired for a 
semi-infinite range of the independent variable (frequently representing time), 
real exponential functions are appropriate coordinate functions. On the other 
hand, when the superposition of two or more simple or damped harmonics, of 
unknown periods, is to be analyzed, complex exponential functions are ap- 
propriate. Prony’s method of curve fitting, which includes both these cases 
when the data points are equally spaced, is presented in Sec. 9.4 and is specialized 
to the second case in Sec. 9.5. 

Methods of optimum collocative polynomial interpolation are considered 
in Secs. 9.6 and 9.7, the Lanczos method of improving the efficiency of a given 
polynomial approximation is described in Sec. 9.8, and further approaches to 
uniform (minimax) polynomial approximation are considered in Sec. 9.9. 
Sections 9.10 to 9.13 are devoted to approximation by a class of coordinate 
functions, known as cubic splines, which may be described as sets of cubic 
polynomial segments with smooth joins. 

A natural generalization of polynomial approximation consists of approx- 
imation by ratios of polynomials, that is, by rational functions. Such approx- 
imations can be expressed conveniently in terms of continued fractions and are 
treated in the concluding sections of this chapter (Secs. 9.14 to 9.18). 


9.2 Fourier Approximation: Continuous Domain 


We suppose here that a function f(x) to be approximated is a periodic function, 
of known period, and that the scale of units has been so adjusted that the period 
is 27, so that 


I(x + 22) = f(x) (9.2.1) 
A particularly convenient class of coordinate functions is represented by the set 
I, cos x, cos 2x,..., Cos rx,...; sin x, sin 2x,..., sin rx,... 


each member of which is of period 2x. This set has the useful property that the 
product of any two members is expressible as a linear combination of two 
members. Also, the derivative of each member is also a member, and the same 
is true of the integral of each member except the constant. 
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But the principal source of convenience is the verifiable fact that the set is 
orthogonal over any period interval, say, the interval [—7, π], so that 


Ϊ sin jx sin kx dx = 0 () # k) 


—t 


{ cos jx cos kx dx = 0 ( τ k) (9.2.2) 


| sin jx cos kx dx = 0 
when j and k are nonnegative integers. Clearly, negative integers need not be 
considered. 

Suppose now that we require a so-called Fourier approximation to f (x) 
of the form 


f(x) © ag + > (a, cos kx + b, sin kx) (9.2.3) 
K=1 


involving harmonics through the nth, where the coefficients are to be deter- 
mined in such a way that the integrated squared error over an interval of length 
2π is least. From the periodicity of f(x) and of the sine and cosine harmonics, 
it follows that attention may be restricted to any period interval, say, the interval 
[—z, x]. Then the requirement 


Ϊ (f(x) -- αὐ — Σ (a, cos kx + δι sin kx)|* dx = min (9.2.4) 
-π k=1 


leads to the necessary conditions 


| Ke — 4) — Σ (a, cos kx + 6, sin kx)| dx = 0 
—t k=1 


[ COS rx Ke — a) — Σ (a, cos kx + b, sin kx)| dx = 0 
= (9.2.5) 


| sin rx fe — ay — > (a, cos kx + b, sin kx)| dx = 0 


k=1 
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when the partial derivatives of the left-hand member of (9.2.4) with respect to 
Ao, a,, and b, are equated to zero. Reference to the relations (9.2.2) and to the 
relations 


| dx = 2x { cos” kx dx = | sin?kxdx =x (k 0) (9.2.6) 


then leads to the determinations 


aoe 17 ΠΣ wae =. το. G20) 
-π “x (9.2.7) 


b, = al F(x) sin kx dx 
π —~T 


If f(x) is an even function, so that f(—x) = f(»), it is seen that b, = 0, 
so that (9.2.3) then reduces to 


f(x) © dg + > a, cos kx (9.2.8) 
k=1 
where 


re mI f(x) dx = ΟΣ 
-κπ 0 


a, = - [ F(x) cos kx dx = : [ f(x) cos kx dx (k # 0) (9.2.9) 
-π 0 


Similarly, if f(x) is an odd function, so that f(—x) = —f(x), there follows 
Ay = a, = Ὁ and (9.2.3) then becomes 


f(x) = > δε sinkx (9.2.10) 
k=1 
where 


eae | ΓΞ 3] f(x) sin kx dx (9.2.11) 
n}_. nt Io 


If the periodic function f(x) is fairly well behaved [in particular, if only 
J (x) is bounded and piecewise differentiable], it is known that the approximation 
actually tends to f(x) as n — oo for all values of x at which f(x) is continuous, 
and that it tends to the mean value 4[ f(x+) + f(x—)] of the right- and left- 
hand limits at each point of discontinuity. 
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It is important to notice that, as is typical of least-squares approximations 
by orthogonal functions, each coefficient is determined independently of all 
others, and its value does not depend upon the number of harmonics to be 
retained in the approximation. As an example, suppose that f(x) is defined in 
[ —z, π] in such a way that 

0 (—-z =x <0) 


O<x< Ἶ 
αν. (9.2.12) 


4 Pexecn 
Σ [5 τὴ 


and is defined elsewhere by the requirement that it be periodic, with period 2π 
(see Fig. 9.1). Since f(x) is neither even nor odd, the presence of both sine and 


70) = 


70) 


FIGURE 9.1 


cosine harmonics may be anticipated. Equations (9.2.7) give 


0 πί2 π 
ΞΕ: 0 dx + x dx + T dx ὡς 
2π -κ 0 n/2 2 16 


1 0 πί2 π κ 
- | 0 cos kx dx + Ϊ x cos kx dx + — cos kx ax| 
7 ee ee 0 | 


π[2 


ay, 


Ι 


1 kn 
- zal! — COS =) (k # 0) 


H 


Tk. 

1] ° BA J ae | 

δι, “|| Osin kx dx + | «sin kx dx + | 5 μα kx dx 
π “Χ 


0 {2 


1 kn kr 
= --- [910 — — —cos kz 
nk? 2 2 


Thus there follows 


3π 1 1 1 
x) = — — Ξ ροβχ -- — cos 2x — — cos 3x —°: 
79 16 π 2π 9π 


Ι! 


2 -Ἐππ 


οἷ, sin 3x -- <r: (9.2.13) 


Sse ea a 
4 π 
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when x # Ἐπ, +3z,.... If the best least-squares approximation to f(x) 
over [-- π, x] involving only harmonics through the second were required, it 
would be obtained, by suppressing all higher harmonics in (9.2.13), in the form 


F(x) © 0.589 — 0.318 cos x — 0.159 cos 2x + 0.818 sin x — 0.250 sin 2x 


if (say) the coefficients were rounded to three places. 

Because of the discontinuities in f(x), a rather large number of terms 
would be needed, in this particular case, to afford a good approximation to 
J (x), particularly near the discontinuities. However, there are in fact many 
practical situations in which only the coefficients of certain harmonics of low 
order are required and in which the degree of approximation afforded by a 
given number of harmonics is not of great interest. 

It is clear that if f(x) were not periodic, but were defined by (9.2.12) in 
the interval [~zx, 2], the expansion (9.2.13) still would be valid inside that 
interval regardless of the behavior of f(x) elsewhere. More generally, if the 
representation (9.2.3) were determined according to the formulas of (9.2.7) 
for any function f(x) for which the integrals exist, the result would comprise 
the n-harmonic least-squares Fourier approximation to f(x) over the interval 
[—x, x]. Outside that interval, the trigonometric expression would continue 
to define a periodic function regardless of the behavior of f(x) itself outside 
that interval. 

Further, if f(x) is defined in [0,2], and if the coefficients in the 
approximation 


f(x) a9 + D a coskx (OSxS2) (9.2.14) 
k=1 


are determined by the equations 


1 π π 
ay = 2 "fe as a, = 2 [ἡ ΤΟ) 008 kx ds (kK = 1,2,...,n) 
T Io π Jo 
(9.2.15) 
the result will represent the n-harmonic least-squares Fourier cosine approx- 
imation to f(x) over [0, z]. Similarly, the corresponding least-squares sine- 
harmonic approximation over that half-range is given by 


f(x) & 2 b,sinkx (OSxX2) (9.2.16) 
= 1 
where 


Fee 


π 


[ f(x) sin kx dx (9θ.2.17) 
0 
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These results are immediate consequences of the verifiable orthogonality 
relations 


(Gj # k) 


Gj =k #0) 


I 


| sin jx sin kx dx (9.2.18) 


0 


| 


G 


(j =k =0) 


k#0) (9.2.19) 


| cos jx cos kx dx 


0 
ih 
2 
0 Ge#k) 
a 
ὃ 2 
7 
where j and k are nonnegative integers. These relations can also be deduced 
directly from (9.2.2). 


9.3 Fourier Approximation: Discrete Domain 


We assume again that f(x) is of period 2x, but now suppose that its values are 
known only on a discrete set of equally spaced points in a period interval, say 
at the 2N + 1 points 
fe UO τ τ τὰ 
Ν Ν Ν Ν 
of the interval [-- π, π]. Since 7(-- πὴ = f(x), from the assumed periodicity,f 
we then have 2N independent data, which may be expected to serve to de- 
termine the coefficients of 2N terms of an approximation of the form (9.2.3). 
If we denote the rth abscissa as 
7 


=F (9.3.1) 
for P SON +S 1 SN -ὀἮ 2,..., M0 Ag = 1, ΔΝ, 80 that the 2N 
independent values f, = f(x,) are prescribed, we may verify that only the 2N 
functions 


1, cos x, cos 2x,..., cos Nx; sin x, sin 2x,..., sin (N — 1)x 


of the set considered in the preceding section are independent over the domain 
comprising this set of abscissas; for the function sin Nx vanishes at each of 
these points, and each of the functions cos (V + 1)x,... andsin (NV + I)x,... 


+ If f(x) is discontinuous at the ends of the period interval [—, z] or, in other situations, 
is undefined outside that interval, the mean value 


(f(a—) + ᾿ΠΈΙ2 = [fa—) + Ἦ [(--π :Ὲ}}} 
is to be assigned to f(x) at both end points. 
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takes on the same values at points in the set as does one of the 2N functions listed 
above. For example, since sin Nx, = 0, we have 


cos (NV + 1)x, = cos Nx, cos x, = (—1)" cos x, = cos (N — 1)x, 


It is possible to show that this set of functions is orthogonal under sum- 
mation over the set (9.3.1) (see Probs. 7 and 8), so that, with the notation of 
(9.3.1), 


N 
> sinjx,sinkx, =0 (#k) 
r>=—-N+1 
N 
> cos jx, cos kx, 


Ι 
Θ 


(j # k) (9.3.2) 


> sin jx, cos kx, = 0 
r=—-N+1 


when j and k are integers between 0 and N, inclusive, in analogy to (9.2.2). 
Furthermore, in the excluded cases for which j = k, the results 


N N 
> sinrkx,= > cos?kx,=N (k €0,N) 
r=-~—-N+1 r=—-Nt+1 
, ᾿ (9.3.3) 
1 --ξ2Ν > cos? Nx, = 2N 
r=—-N+1 r=—-N+1 


can be established in analogy to (9.2.6). 


If now an approximation is assumed in the form 
f(x) © Ay + Ze (A, cos kx + B, sin kx) (9.3.4) 


where n < N, and if the least-squares criterion 


N n 2 

> | f(x) — 40 — > (A, cos kx, + B, sin kx) = min (9.3.5) 
r=-N+t+1 k=1 
is adopted, a derivation completely analogous to that leading from (9.2.4) to 
(9.2.7), making use of (9.3.2) and (9.3.3), yields the determinations 


i ι « 
ἘΞ oases Δ es ΣΙ 
a > | f(x) ἘΠ f(x,) οοβ Κα, (Κ #0, Ν) 
r=—-N+1 
(9.3.6) 
ἢ » i< 
Ay = — I (x,) cos Nx, B, = — J (x,) sin kx, 
ΖΝ Het Neen 


Thus the coefficients in (9.3.4) are easily obtained by summation, and the cal- 
culation of each coefficient is again independent of the calculation of the others, 
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and is independent of n as long as n S$ N. When n = N, the least-squares 
criterion becomes equivalent to the requirement that the two members of (9.3.4) 
agree exactly at the 2N points specified by (9.3.1). 

The formulas (9.3.6) can be written in the more symmetrical forms 


Ao = Of» + fener toocthirtho th too: +hy-1 + tn) 


1 
A, = eos cos kx_y + f_n4 1 COS a Ἔ 


+ f., cos kx_, + fo cos Κχο + ἢ, cos kx, + °°" 
+ fy—1 COS Κχν- + 4fy cos kxy) (k #0, N) 


1 
Ay = ay 57:Ν cos Nx_y + νει COS Nx-w4i1 ἘΠ΄ 


(9.3.7) 
+ f_, cos Nx_, + fo cos Nxo + f; cos Nx, + °°" 
+ fy—-1 cos Nxy_-; + 4 fy COS Nxy) 
1 ; ; 
B, = ν (9-ν sin kx _y + fenwei sin Κχ. νει Ἔ᾽: 
+ f_, sin kx_, + fo sin Κχρ + ἢ sin kx, ἘΠ" 
+ fy—1 sin kxy_, + $fy sin kxy) 
in view of the relations f_y = Sy. 
If we notice that the spacing ἢ is given by 
π 
h=— 9.3.8 
~ (9.3.8) 


we may observe the curious fact that Eqs. (9.3.7) are identical with the results of 
using the trapezoidal rule to approximate the right-hand members of (9.2.7), 
when k < N.f 

For the purpose of numerical calculation, it is convenient to resolve 
f(x) into its even and odd components by introducing the auxiliary functions 


F(x) - 3:70) -)(-Ἱ σοὺ = 41/@) -- f(-»)] (9.3.9) 
so that 
f(x) = F(x) + G(x) ~—s (9.3.10) 


+ In this connection, it is interesting to recall that the Euler-Maclaurin sum formula, 
written in the form (5.8.17), reduces to the trapezoidal rule for any periodic function, 
with period equal to the length of the range of integration. That is, the “correction 
terms” in that formula all vanish in any such case. This fact obviously does not 
indicate that the trapezoidal rule is ‘“‘exact” for periodic functions since the error 
term (5.8.14) remains, but may indeed serve to illustrate the dangers associated 
with lack of proper regard for the error term in such formulas. 
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If we recall that x_, = —x,, and that x) = 0, we find that Eqs. (9.3.6) or 
(9.3.7) may be reduced to the forms 


Ao = — GF, + Fy + Fy +°°' + Εν. + 4Fy) 


A, = = BFo + F, cos kx, + F, cos kx, Ἔ τ" 


+ Fy_, cos kxy_, + 4Fy cos kxy) (k#0,N) (9.3.11) 


1 = 
Ay = py (Fo — Fy + Fy —-+++ + (—1"'Fy_-1 + (—1)"4Fy) 


B, = = (G, sin kx; + G, sin kx, + +++ + Gy_, sin kxy_,) 


In order to illustrate the use of these formulas, we consider the case N = 6, 
corresponding to the use of 12 independent ordinates. The tabular forms given 
in Table 9.1@ and ὃ are then appropriate for desk calculation (although further 
systematization is possible). In Table 9.1a, the sum of the entries in the data 
column is 6440, whereas the sum of products of corresponding entries in the data 
column and the column headed cos kx is 3A, or 6A,. Similarly, the sum of 
products of corresponding entries in the data column of Table 9.15 and the 
column headed sin kx is 3B,. 


Table 9.14 

x Data cos x cos 2x cos 3x cos 4x cos 5x cos 6x 

0 δ = 4Fo 1 1 1 1 1 1 

; Hits ὺ Ξ δ, 4V3 } 0 —4 -23ὦν -1 

3 HAtsfd)=F 8 - -1 —4 2 1 

: 30. Ὁ 5.3) = Fs 0 τὴ 0 1 0 24 

Qn 

3 (ἃ + fia) = Fa a —4 1 -ξ —4 1 

5 : ᾿ 

τ᾿ Mfs+f.s)=Fs --νΆ 5 0 ~4 V3 - 

a a ee , Ὁ - Στ 
64. 34, 34, 3A; 34, 3As 6A¢ 
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Table 9.15 

x Data sinx sin 2x sin 3x sin 4x sin Sx 
- Wi λῦπαι 4 3.3 1 V3 2 

Ξ Mh —f-2=G. W3 4V3 0 —43V3 —4V3 
: 195. -ξδῦξσ 1 0 -1 0 1 
fad Ge WE - 0 W300 -W3 
= ἀξ :: 1 -~4V3 1 —1v3 4 


3B, 3B, 385 384 38. 


In illustration, for the empirical data 
θ. | 0° 30° 600. 900 1200 1505 1805 2109. 2405 2705 3005 330° 360" 


[121 1.32 1.46 1.40 1.34 1.18 107 1.01 1.05 110 1.14 1.17 1.21 


aio ἃ & -& 2 οἶς οὐ τ oe τ ee ὦ 
6 3 2 3 6 6 3 2 3 6 
the entries in the respective data columns of Tables 9.1a and ὃ are 
0.605 
1.245 0.075 
1.300 0.160 
1.250 0.150 
1.195 0.145 
1.095 0.085 
0.535 
and calculation gives 
Ag = 1.204 A, = 0.084 A, = —0.062 
A, = —0.012 A, = —0.009 B, = 0.165 
B, = 0.001 B, - 0.003 B, — — 0.007 


for the coefficients of harmonics through the fourth. 

In order to obtain a seven-point cosine approximation to a function 
F(x) over the half-range 0 < x S 2, through harmonics of order not exceeding 
six, use would be made of Table 9.1a only, whereas for a five-point sine approx- 
imation to G(x) over the same half-range, through harmonics of order not 
exceeding five, only Table 9.1b would be used. In any case, if all the available 
harmonics are retained, the resultant approximation takes on the prescribed 


APPROXIMATIONS OF VARIOUS TYPES 457 


value at each of the points employed in the calculation and accordingly represents 
a collocative interpolation. Retention of a smaller number of harmonics leads 
to the appropriate least-squares approximation relevant to that set of points. 
Tables corresponding to those given here, but employing larger sets of data and 
further systematized in various ways, may be found in the literature (see Sec. 
9.19). A related procedure is described in Sec. 9.7. 


9.4 Exponential Approximation 

In certain situations it is desired to determine an approximation of the form 
F(x) ® Cye*™* + Cye** +++ + Ce” (9.4.1) 

or, equivalently, of the form 


70) © Ομμτ + Ομ tees + Cue (0.42) 
where 
= e* (9.43) 


It is somewhat more convenient here to work with the second form (9.4.2). 
We suppose that values of f(x) (exact or approximate) are specified on a 
set of N equally spaced points, and that a linear change of variables has been 
introduced in advance in such a way that the data points are x = 0.1. 2 erst 
N — 1. If (9.4.1) were to be an equality for these values of x, the equations 


Crt Oy ++ CC. =F, 
Οιμι + Copy +++ 4+ Cu, =f; 
Cyt + Cog +++ + Cw =f (9.4.4) 


Cyt + Copy +e + Cr = fy~1 


necessarily would be satisfied, and the approximation (9.4.2) may be based 
on the result of satisfying these equations as nearly as possible. If the constants 
μιν νον H, Were known (or preassigned), this set would comprise N linear 
equations in the ἡ unknowns C,,..., C, and could be solved exactly if N = n 
or approximately, by the least-squares method of Sec. 7.3, if N > ἡ. 

However, if the y’s are also to be determined, at least 2n equations are 
needed, and the difficulty consists of the fact that the equations are nonlinear 
in the p’s. This difficulty can be minimized by a method, similar to methods used 
in Sec. 8.15, to be described next. 


+ For the cosine approximation, the result corresponds to the use of one-half weights 
with respect to the errors at 0 and N (see Prob. 9). 
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Let p,,..-, μᾳ be the roots of the algebraic equation 
w+ apt + ap 2 +++ +o + a, = 0 (9.4.5) 
so that the left-hand member of (9.4.5) is identified with the product 


(u — μι)ίμ — μ2}.τ(μ -- μὴ 


In order to determine the coefficients a,,..., %,, we multiply the first equation 
in (9.4.4) by «,, the second equation by «,_,,..., the mth equation by αι, and 
the (n + 1)th equation by 1, and add the results. If use is made of the fact that 
each μ satisfies (9.4.5), the result is seen to be of the form 


Tn + Ofn-1 ἘΠ᾽: + G& So = 0 


A set of N — n — 1 additional equations of similar type is obtained 
in the same way by starting instead successively with the second, third,..., 
(N — n)th equations. In this way we find that (9.4.4) and (9.4.5) imply the 
N — n linear equations 

Sn + Sn-1%1 + Sn-2%2 $77 + fot, = 0 


ζω. this Ὁ ζ,- ταὶ ἘΠ τ τ + λα, = 0 


Fy-1 + fy—2%1 + Sy—3%2 + °° Ὁ Su-n-1% = Ὁ 


Since the ordinates f, are known, this set generally can be solved directly for 
the 7 α if N = 2n, or solved approximately by the method of least squares 
if N > 2n. 

After the a’s are determined, the 7 p’s are found as the roots of (9.4.5). 
They may be real or imaginary. The equations (9.4.4) then become linear 
equations in the n C’s, with known coefficients. The C’s can be determined, 
finally, from the first n of these equations or, preferably, by applying the least- 
squares technique to the entire set. Thus the nonlinearity of the system is 
concentrated in the single algebraic equation (9.4.5). The technique described 
is known as Prony’s method. 

Obvious modifications are necessary when certain of the y’s (or a’s) 
are prescribed and the remainder are to be determined. When such constraints 
are imposed, and are to be satisfied exactly, it is essential to satisfy them (by 
using them to eliminate unknowns from the set of equations to: be. solved) 
before applying the method of least squares. 

The most common situation of this sort is that in which it is known that 
f(x) tends to a finite limit (the value of which is generally unknown) as x > οὐ. 
The approximation 


F(x) κα Co + Cye** + °° + Cie (9.4.7) 
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is then appropriate, where the a’s are expected to have negative real parts. Since 
this approximation implies that 


Af(x) % Cye* + +++ + Clem 


where the coefficient Οἵ is an unknown constant which is simply related to the 
unknown Οὐ, the equations (9.4.6) may be modified, in this case, by replacing 
each f, by the difference Af, = 44 — Δ, after which the a’s and p’s are 
determined as before. The equations (9.4.4) are then modified by the insertion 
of the unknown ( in each left-hand member. Atleast N = 2n + 1 independent 
data are needed for the determination. 

If one or more of the p’s satisfying (9.4.5) are not real and positive, the 
corresponding values of the a’s in (9.4.1) will not be real. In particular, if 
μκ is real and negative, say μὲ = —p,, where p, is positive, the term i = 
(-- ρ is real only when x takes on the (integral) values for which data are 
prescribed, or values which differ from those values by integral multiples of 
the (unit) spacing. However, we may notice that (—1)* = cos mx for any such 
value of x. Hence, if we replace (—p,)* by pz cos mx or, equivalently, by 
e* '°8 cos nx, we so obtain a suitable interpolating function which is real for 
all real values of x. 

More generally, if one value of μ is nonreal, and hence expressible in the 
polar form pe’, where p and β are real and p 1S positive, then the conjugate 
pe~*® must also be involved, since the coefficients in (9.4.5) are necessarily 
real. The corresponding part of (9.4.2) can then be written as 


pr(A,e™* + A,e~#) 


where A, and A, are constants which must be conjugate complex in order that 
the expression be real when x is real. Hence, by writing Ay = (C, — iC,)/2 
and A, = (C, + iC,)/2, this part of the approximation can be expressed in the 
more convenient form 


p*(C; cos Bx + C, sin Bx) = e*!®°(C, cos Bx + C, sin Bx) (9.4.8) 


after the y’s are determined from (9.4.5) and (9.4.6) but before equations 
corresponding to (9.4.4) are formed and solved for the coefficients of the 
approximating functions. 

In order to illustrate both the technique and the existence of unfavorable 
situations, we consider the attempt to recover the equation of the function 


F(x) = 2.32 — 1.08e-* + 1.20e72* (9.4.9) 


from the values of that function for x = 0, 1, 2, 3, and 4, under the hypothesis 
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that the numerical coefficients in (9.4.9) are exact. These values are given, 
to four decimal places, in the following tabulation: 


x 0 1 2 3 4 


f@) 2.4400 2.0851 2.1958 2.2692 2.3006 


If the ordinates are arbitrarily rounded to two decimal places, the required 
differences of the rounded values are found to be —0.35, 0.11, 0.07, and 0.03, 
and Eqs. (9.4.6), with f, replaced by Af, become 


0.07 + 0.11α, — 0.350, = 0 
0.03 + 0.070, + 0.11; = 0 


from which there follows «, = —7; = —0.497 and a, = τς = 0.0437. 
Equation (9.4.5) then becomes 


(9.4.10) 


183μ2 — 9Ip + 8 =0 


and yields μι, + 0.383 and p, = 0.114 to three Dine Thus the required 
approximation is to be of the form 


70) 


ἐὲ 


Co + C,(0.383)* + C,(0.114)* 
Cy + Cye °° + Cye™29®* (9.4.11) 


after which the C’s may be determined by fitting the data at three points, or 
by use of a least-squares procedure over the five points for which data are 
provided. More nearly accurate determinations of the decay factors would have 
resulted from a reduction of inherent errors in the data employed, or from the 
result of using additional data to supply additional equations, and solving the 
resultant set approximately by least-squares methods. 

Suppose, however, that the values f(1) = 2.0851 and f(2) = 2.1958 
were incorrectly rounded to 2.08 and 2.19, respectively. We notice that the 
roundoff errors so introduced are only slightly greater than those effected by the 
correct rounding, and we may consider these additional errors as representative 
of observational errors which could result if the data were empirical. The four 
relevant differences are then —0.36, 0.11, 0.08, and 0.03, and the equations 
replacing (9.4.10) become 


δὴ 


0.08 + 0.11α, -- 0.360, = 0 
0.03 + 0.08a, + O.1lla, = 0 


from which there follows «, = —12§ = —0.479 and a, = χοο = 0.0758. 
The equation which determines approximations to μι and p, is then 409p7 
196 + 31 = 0, which yields the nonreal roots μι.) = 0.240 + 0.1367. Since, 
accordingly p,. = e ‘79°?! the form replacing (9.4.11) here becomes 


f(x) & Co + ε1295 Ὁ, cos 0.515x + Cz sin 0.515.) (9.4.13) 


(9.4.12) 
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from which the C’s may be determined by collocation or by least squares. 

Whereas it is found that the coefficients in (9.4.11) and (9.4.13) can be 
determined in such a way that they both provide good approximations to the 
true function (9.4.9) for 0 < x < 4 and, indeed, depart only slightly from 
f(x) for all x = 0, the latter approximation is oscillatory, while the true function 
and the former approximation are not. The slight additional errors introduced 
into the given data here lead to completely incorrect information concerning the 
decay factors. 

While this example was selected deliberately to illustrate a particularly 
unfavorable situation, this type of “instability” is of common occurrence when 
it is necessary to determine the approximating coordinate functions themselves, 
in addition to the constants of combination to be associated with them. In 
such cases, it is particularly desirable that an error analysis be made. 

Since here the true values of yp, and μ; are ε΄ ἷ = 0.368 and e~? = 0.135, 
the true values of «, and α; are —(e~! + 672) = —0,503 and e~> = 0.0498. 
Thus, in the second calculation, errors of magnitude smaller than 0.006 in 
the data employed lead to errors of about 0.024 and 0.026 in the calculation of 
a, and «,, respectively, and these errors, in turn, lead to nonreal approximations 
of the real μ, and μ2. The possibility of appreciably larger errors than those 
actually encountered in the calculation of a, and «, from either of the sets 
(9.4.10) and (9.4.12), assuming the coefficients to be correct to the places given, 
could have been predicted by an analysis of those sets.f Once such estimates are 
obtained, the maximum (or RMS) values of the errors Ou, and dp, in the roots 
of w+ ay +a, =0 may be estimated, by use of the differential relation 


(Qu + a) du + μ da, + da, = 0 


as 
1+ 1 
[δμα | max ~ eo Meal δα] max [δμι χ᾽ max ~ ἘΠΕ 718 δα! max (9.4.14) 
ἰμ — μι] [U2 — Hy| 
Or 
V1 + 2 1+ 2 
(O11) RMs ~ eae (0&)ams (OL2) rms ~ Vi + μὲ (O®) pos (9.4.15) 
μ — μα] [μη -- μι} 


with μι and 2 replaced by their calculated values, if those calculated values are 
real and if the errors are small. The reality of Hy and p, depends upon the 
positivity of αἵ — 4a, and accordingly is in doubt if 


Ιαΐ — 4.,] < 2(2 + [α{|}]δα! 


max 


+ The analysis of RMS errors relevant to the normal equations obtained in a least- 
squares procedure is described in Sec. 7.4 [see (7.4.30)]. For the corresponding 
analysis of maximum errors when least-squares methods are not used, as in the 
present case, see Sec. 10.7. 
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when a, and «, are estimated by their calculated values. Similar considerations 
apply to the more involved cases in which more coordinate functions are 
employed. 


9.5 Determination of Constituent Periodicities 


It frequently happens that an empirical function f(x) is known to be expressible 
as a linear combination of two or more periodic terms whose periods are 
unknown and are not necessarily commensurable, and the approximate deter- 
mination of these periods from empirical data is often of considerable 
importance. 

If m distinct periods, denoted by 27/@,,..., 2π|ω,.» are known (or 
assumed) to be present, then f(x) correspondingly can be assumed to be approx- 
imated by an expression of the form 


F(x) & A, COS @,x + By, sin @yx τ "7 + Ap COS Wm_X + By, SIN ὡ,Χ 
(9.5.1) 


But such an approximation is a special case of (9.4.1), in which n = 2m and 
in which the a’s are identified, respectively, with iw,, —im,,...,i@,, and 
—iw,, Thus the desired values of ὦ may be obtained, by Prony’s method, if 
we set n = 2m and p = e’ in (9.4.5) and (9.4.6), again assuming f(x) to be 
given for x = 0,1, 2,...,N — 1 

Since, in this case, the roots of (9.4.5) are known (or required) to occur 
in reciprocal pairs (e“”*, e~'*), it follows that (9.4.5) must be invariant under 
the substitution of 1/u for p, so that we must have a2, = 1, 2m-1 = %,---, 
Ont 1 = Sm—y. Thus, with p = e, Eq. (9.4.5) becomes 


gizmo 4 α, οἰ(ὥνι-- 1γω rane a, elem t De + oem 
+ Oy em? 4 ++ + ae? +1=0 
or 
εἰπωζ(οἰπω 4 em imay 4 g (elm De 4 em Km Ne) 4 τον 
+ &,-1(e + οἰ + O%,] = 0 
and hence, finally, since e’"° 4 0, we find that the equation determining ὦ 


is of the form 


2 cos mw + 2a, cos (m — 1)m +++ + 26,,.--1 COS@ + a, = O (9.5.2) 


+ In the more general case, the dimensionless variable x accordingly represents dis- 
placement from a reference point in units of the actual spacing h, as in the preceding 
section, and the calculated periods are also to be considered as expressed in units 
of A. 
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Since cos kw is expressible as a polynomial of degree k in cos w, this equation 
can be expressed as an algebraic equation of degree m in cos w. Indeed, it 
can be expressed in the form 


T,(COS ὦ) + 0T,,-,(COS ὦ) + ++: + O%,—1T,(cos w) + 4a,, = 0 


in terms of the Chebyshev polynomials of Sec. 7.9. 

The N equations (9.4.6), which serve to determine the coefficients «,,..., 
ας, in (9.5.2), reduce (again with n = 2m, Ons, = Om—j and &, = 1) to the 
forms 


fo + fam + (Ai + Sam-1)61 + So + Som—2)2 + °° 
τ ("-" + fint1)%m—1 Sutin at 0 


ti + fom+1 + ΟΣ + fom)%1 Ἔ (3 + fom—1)%2 + 7.’ 
+ Sin + Sm+2)%m—1 + Sn+1%m = Ὁ (9.5.3) 


soe eee ee em ee "»εοδονϑοοοοοοοοοδϑοοοο.οφονοΦφουνυοοοροπδονοδννο δου δοθο 


Sn-2m-1 + fy-1 + ΟΝ - 2 + Sy-2)% + (fv-2m+1 + Sy—3)02 + °°" 
+ (fy-m-2 + fy—m)o -1 + fy—m=1%m = 0 


In accordance with the fact that the approximation (9.5.1) involves 3m unknown 
constants, we must have N = 3m. The set (9.5.3) then comprises N — 2m Σ m 
equations in the m unknowns ἀκ. 

This set is to be solved (approximately, by least squares, if N > 3m) 
for the «’s, and the w’s are then determined from (9.5.2), after which the coef- 
ficients in (9.5.1) are determined (if their values are desired) by writing down 
the conditions which would require (9.5.1) to be an equality for at least 2m 
of the N relevant values of x and solving that set approximately, by the method 
of least squares, if more than 2m conditions are used, 

If, in addition, an unknown constant A, is present in the right-hand 
member of (9.5.1), Eqs. (9.5.3) are to be modified by replacing each f, by Af, 
and the constant A, will then appear only in the set of equations determining 
the coefficients in (9.5.1). Here we must have N > 3m + 1 given data. 

As a simple illustration, we attempt to determine the constituent periods 
of the function 


f(x) = cos ᾿ + sin “ (9.5.4) 


assuming knowledge only of the following rounded values of that function: 
x {0 1 2 3 4 5 6 7 8 9 10 


f(x) | 1.00 1.21 0.50 -0.29 --Οδὸ —021 0.00 —021 - 050 —0.29 0.50 
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If we suppose that the vanishing of the overall mean value of f(x) 15 
not known in advance but that there is evidence (from physical considerations 
or otherwise) that the deviation from the mean is due to the superposition of two 
periodic processes, we first calculate the relevant differences: 


x | 0 1 2 3 4 5 6 7 8 9 


Af(x) Ο.21 —0.71 —0.79 —0.21 0.29 0.21 --Ο2ὦδλἠ —0.29 0.21 0.79 
Next, the six equations corresponding to (9.5.3) are written down 
0.50 — 0.92a, — 0.79a, = 0 


—0.50 — 0.502, — 0.210, = 0 
— 1.00 + 0.29%, = 0 


(9.5.5) 
—0.50 + 0.080, + 0.210, = 0 
0.50 — 0.08«, — 0.2la, = 0 
1.00 — 0.29a, = 0 
and the relevant normal equations are obtained in the form 
1.10920, + 0.86540, = 0.2900 
(9.5.6) 
0.8654, + 0.92460, = 1.0800 
The solution is found to be a, = —2.4092, a, = 3.4230, to four places, after 


which (9.5.2) becomes 
2 cos 2m — 4.8184 cos ὦ + 3.4230 = 0 
or (9.5.7) 
4 cos? ὦ — 4.8184 cos ὦ + 1.4230 = 0 
and yields the values 
cos ὦ, = 0.5186 COS @, = 0.6860 


from which the appropriate approximations to the true periods 


2 
P,= 2m _ Ρ,- 3 - 
ὩΣ 2 
are found to be 
Pee ee. Pe eT) «(9.5.8 
1.0256 0.8147 


Hence the roundoff errors introduced into the given data here lead to errors of 
about 2 and 4 percent in the calculations of P, and P2, respectively. The 


APPROXIMATIONS OF VARIOUS TYPES 465 


corresponding approximation to the governing equation would then be obtained, 
if it were desired, by fitting the equation 


S(x) & 49 + A, cos 1.026x + B, sin 1.026x + A, cos 0.815x + B, sin 0.815x 
(9.5.9) 


to the data by use of the least-squares procedure. 

Here, and in the general case, it may be noticed that only the value of 
COS @, is determinate. Thus, if we denote by ὦ, the admissible value of a, 
which lies between 0 and z, we can conclude only that the proper approximate 
value of w, is one of the numbers +6, + 2rx (r = 0, 1, 2,...), so that if the 
true physical spacing is h, the actual approximate period is known only to be 
one of the numbers 


2πἢ se 2πἢ 
ὥ, + 2rn 2π — @, + 2rn 


Of these possibilities, only those corresponding to r = 0 can exceed the spacing 
h; the first (2xh/G,) exceeds 2h, whereas the second [2zxh/(2x — G,)] lies 
between A and 2h. The data employed clearly cannot be expected to determine 
periods smaller than the spacing with any appreciable accuracy, in general. 
Whether or not either of the two remaining appropriate alternatives truly 
represents an approximate period could be determined mathematically by 
investigating whether a second calculation based on a set of additional data, with 
a spacing incommensurable with h, also yields that alternative. In practice, the 
decision frequently can be based more simply on an inspection of the graph of 
the data or on physical considerations. 

Situations in which two or more of the constituent periods are nearly 
equal are of frequent practical occurrence and are the most troublesome ones. 
In such cases it is particularly important to retain sufficiently many terms in 
the approximation and to use a sufficiently large set of data when the data are 
inexact. An interesting example of this type is treated successfully in Whittaker 
and Robinson [1944] (Sec. 175) by a method which differs from the present one, 
and also in Willers [1950] (Sec. 30) by a method equivalent to that given here. 
In that case, 600 empirical data are available for the determination of two 
constituent periods, and all are employed in the former treatment. Whereas 
the latter treatment does not use all the 595 equations which could be formed, 
in analogy to (9.5.5), it first makes use of a selected set of 78 equations whose 
formation involves the use of most of the data, and then checks the results by a 
recalculation using a judiciously chosen similar set of 17 equations. 
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9.6 Optimum Polynomial Interpolation with Selected Abscissas 


It has been shown in earlier chapters that, if a function f(x) is approximated 
by the interpolation polynomial y(x) of degree n which agrees with f(x) atn + 1 
points Xo, X,,..., Χθ» We may write 


fEEM%O) 
f(x) = y(x) + ἡ ΠΣ το 96} 


where 


Tx) = (X — χολίχ — χε)" (% — X) (Θ.6.2) 


and where € lies in the interval J limited by the largest and smallest of xo, 
X4,-.+ χη» and x. We suppose here that an appropriate change of variables has 
reduced this interval to the interval [—1, 1]. 

Furthermore, in the preceding chapter it was seen that appropriate choices 
of the n + 1 abscissas lead to quadrature formulas having certain desirable 
characteristics. In this section we investigate briefly a related class of interpola- 
tion formulas and single out a particular formula which is related to trigonometric 
approximation in the following section. 

Whereas the parameter € in (9.6.1) depends upon the n + 1 abscissas 
and the variable x, the nature of that dependence will depend, in turn, upon 
the nature of the function f(x). Thus, if we desire to choose the abscissas in 
such a way that the error E(x) = f(x) -- y(x) will tend to be as small as possible 
over [—1, 1], in some sense, for the set of all such functions having n + 1 
continuous derivatives in [—1, 1], we may attempt to make |z(x)| as small as 
possible in the same sense in that interval, recalling that the coefficient of the 
highest power of x in x(x) must be unity from (9.6.2). 

In particular, we may simulate the condition 


[ w(x)[E(x)]* dx = min 
by the requirement 
[ w(x)[2(x)]? dx = min (9.6.3) 
“1 


where w(x) is a prescribed weighting function which is nonnegative in [ —1, 1]. 
If we notice that n(x) is expressible in the form 


n(x) = χ +b ex” Hott + gx? + OX + Cp (9.6.4) 


and hence may be considered to be specified by the m + 1 coefficients co, 
C1,.++ 3 Cy, we deduce that (9.6.3) leads to the requirement that the partial 
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derivative of the left-hand member with respect to each c, must vanish. Since 
also 


oe) = y (r=0,1,...,n) (9.6.5) 
C, 


this requirement becomes 


2 [ w(x) AY) mx) dx = 2 [ γν(χ)π(α)χ' dx = 0 (r= ΟΣ τὸ ἢ} 
-: OC, “1 (9.6.6) 


so that x(x) is to be that polynomial of degree n + 1, with leading coefficient 
unity, which is orthogonal to all polynomials of inferior degree over { —1, 1] 
relative to w(x). The abscissas of the n + 1 points of collocation, at which the 
agreement between f(x) and the polynomial approximation should be effected, 
are thus the zeros of that polynomial. 

It is of interest to notice that, with the interpolation polynomial so 
determined, the integral approximation 


[. w(x) f(x) dx ἡ | ᾿ γν(χ)γίχ) dx 


is the corresponding gaussian quadrature formula of Chap. 8. 

Thus, in particular, if we take w(x) = 1, and so tend “‘on the average”’ to 
minimize the integral of the square of the error E(x) over the interval [—1, 1], 
it follows from the results of Sec. 7.6 that the καὶ + 1 abscissas should be the zeros 
of P,,- (x). Certain such sets of abscissas are listed in Table 8.1 (Sec. 8.5). 


Further, if we take w(x) = 1 ἵν 1 -- x”, and so tend to minimize the 


integral of [E (x) P2/V 1 — x’, the results of Sec. 7.9 show that the abscissas 
are to be the zeros of the (x + 1)th Chebyshev polynomial T,, ,(x) 


T,,+1(x) = cos [(n + 1) cos! x] (9.6.7) 
and hence are given by 


2i+in2n 
x; = COS - p= 0,1,..., 9.6. 
Gael ( 0) 


Since the coefficient of x” in T,(x) is 2’~1, it follows also that then 


Mx) = 2°°Ti+1) — (9.6.9) 


In addition, we may notice that the extreme values of x(x) in [—1, 1] are then 
+2™" and are taken on (with successively alternating signs) at the end points 
x = +1 and at n additional interior points, each of which separates a pair of 
adjacent abscissas. Thus, with this choice of the abscissas, the coefficient of 
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fT ΘΛ + 1)! in the error term of (9.6.1) oscillates with constant amplitude 
2." as x increases from —1 to 1. 


On the other hand, since the coefficient of x” in P,(x) is 2-’Qr)!/(r!)7, 
the use of the zeros of P,,, ,(x) as the abscissas of collocation corresponds to the 
identification 


25 1[(ὰ - 1} 


π(χ) = PoaalX 9.6.10 

(x) Qn dD! μὰ) ( ) 

Now the Legendre polynomial takes on the value +1 at x = +1 and the value 
(—1)"t! at x = —1, and P,,,(x) performs oscillations in [—1, 1] in such a 


way that the πὶ successive maxima and minima separating pairs of adjacent zeros 
inside the interval decrease in magnitude toward the center of that interval. Thus, 
in particular, the maximum absolute value of x(x) in (9.6.10), over [—1, 1], 
is given by the numerical factor in that equation, which is approximated by 
2-*/ mn/4 when n is large. 

Hence it follows that, whereas use of the zeros of P,,,,(x) minimizes the 
RMS value of x(x) over [—1, 1], the use of the zeros of Τ᾿, (x) leads to a value 
of |2(x)| max Which is smaller than that corresponding to the former choice, by a 
factor which tends to increase in proportion to n'/? as n increases. Furthermore, 
the error will tend to oscillate uniformly over [ —1, 1] in the second case, whereas 
it will tend to oscillate with an amplitude increasing toward the ends of the 
interval in the first case, on the average. Thus, if it is desirable to control the 
maximum error, rather than the RMS error, the second choice generally will be 
preferable to the first. 

Indeed, it was discovered by Chebyshev that this choice is the best possible 
one, when the maximum-error criterion is adopted. The proof follows most 
easily by assuming, on the contrary, that there exists a monic polynomial 7z(x) 
of degree n + 1 (with leading coefficient unity) whose maximum absolute value 
on [—1, 1] is smaller than 2". Then the difference x(x) — 2°"T,4.(x) 15 
negative at the maxima of T,,,,(x) and positive at its minima. Hence, since 
2-"T,,,(x) takes on its extreme values (- 2. 5) at n + 2 points of [ —1, 1], 
including the ends, the difference z(x) — 2. "7,4 ,(x) must vanish at least n + 1 
times. But, since this difference is a polynomial of degree n or less (the common 
leading term x"*! being removed by the subtraction), this situation is impossible, 
and the desired contradiction is obtained. 

It should not be forgotten that this result applies only to the minimization 
of the maximum value of |z(x)| over [—1, 1]. For any specific function f(x), 
the maximum absolute value of the error πα ΘΛ + 1}} in (0.6.1) 
generally will not be minimized exactly since the maximum value of |f"* "'(¢)| 
generally will not be attained in correspondence with an abscissa for which 
\z(x)| 1s greatest. 
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9.70. Chebyshev Interpolation 


In this section, we consider in more detail the polynomial interpolation formula 
based on collocation at the zeros of T,,, ,(x). Since any polynomial of degree n 
can be expressed as a linear combination of Chebyshev polynomials of degrees 
zero through n, it is convenient to express the polynomial y(x) which agrees 
with f(x) when x = Xo, X;,..-, X,, where x, is the rth zero of T,,, ,(x), in such 
a form, and so to write 


Tri Gf"%® 0.7.) 


I(x) = De C, T(x) + rn + 1)! 


in accordance with (9.6.1), where |é| < 1 under the assumption that x is in 
[—1, 1]. The C’s are to be determined in such a way that the result of sup- 
pressing the error term is correct when x = Xo, X;,..., X,, Where 


2i + I ὃ 
xX; = COS = 0, 1,..., 9.7.2 
tae ΝΣ n) Θ72) 


Whereas the desired interpolation polynomial could be expressed in the 
Jagrangian form of Chap. 3, the following alternative procedure usually is more — 
convenient for its determination. If we introduce the change of variables 


x = cos 0 (0 “9 Ξ πὶ (9.7.3) 


the requirement 


f(x) κα > GHC) (-1<x<1) (0.74) 


becomes 


lA 
D 
lA 


F(0) & 2 C,cosk@ (Ὁ -Ξθ - πὶ (9.7.5) 
k= 


with the abbreviation 
F(@) = f(cos 8) (9.7.6) 


The C’s are now to be determined in such a way that (9.7.5) is an equality 
when 6 = @,, where 


(i=0,1,...,n) (9.7.7) 


Thus the agreement is to occur at the equally spaced points n/(2n + 2), 
3n/(2n + 2),..., (2n + 1)x/(2n + 2), which are seen to be midway between the 
successive points 0, z/(n + 1), 2x/(m + 1),..., = which would have been 
employed in the procedure of Sec. 9.3, as the points of collocation for the 
determination of the C’s in an approximation of the form (9.7.5). 
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In analogy to corresponding results of that section, it happens that 
cos j@ and cos k@ are orthogonal under summation over the n + 1 points 
defined by (9.7.7) (see Prob. 23): 

0 (j # k) 
n+1 


> cos 70, cos ké, = VG =k #0) (9.7.8) 
r=0 


n+ 1 (j=k=0) 


where j and k are nonnegative integers not exceeding n. Moreover, since the 
left-hand member of (9.7.8) is identical with 


Σ THT) 


it follows that, whereas T)(x), T,(x),... are orthogonal under integration 
over [—1, 1] relative to w(x) = 1 Ν᾽ 1 — x’, the functions Τοί, T,(x),..., 
T,(x) are orthogonal under summation over the zeros of T,,,,(x), with a unit 
weighting function. 

The truth of (9.7.8) permits us to deduce immediately that the required 
C’s are expressible in the form 


1 = 2< 
Co = — F(6, Cc, = — F(6,) cos Κθ, k #0 9.7.9 
eee ees es ( ) 09.7.9) 


where 0, is defined by (9.7.7), or, alternatively, in the form 


1 
n+ 1 


oo = 


n 2 n 
DIG) α--τ > fT) (ἀκ 9.1.0) 
r=0 n+ 1 r=0 
where x, is defined by (9.7.2). 
Thus, for example, we may construct Table 9.2 for desk calculation when 


ἢ = 5. Here use has been made of the abbreviations 


A 


cos = = jl2 + /3 = 0.96593 
a (9.7.11) 


cos — 4/2 — J3 = 0.25882 


The dual headings permit the table to be used either with the function expressed 
as f(x) over —1 < x < 1, with the unequally spaced abscissas listed in the 
third column, or with the function expressed as F(9) over 0 S 6 S 2, with the 
equally spaced abscissas listed in the first column. 
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Table 9.2 
f(x) x = T,(x) T2(x) T3(x) T4(x) T5(x) 

5 Fi =h, A 4V3 4V2 4 B 

: ἔχξ ἢ 2ν2 0 -λν2 -1 ~4V2 
= F; = fs B —41V3 —4V2 λ Α 
= Fi = fa —B —4V3 V2 } —A 
= Fs = fs -2ν2 0: 4V2 -} 4/2 
πε Fs = fe -4 4V3 —3}V2 4 LB 


In illustration, the coefficient of cos 40 in (9.7.5) is given by 
C, = 4GF, — Εχ + 4F3 + 4F, — Fs + 4Fe) 
whereas the coefficient of T,(x) in (9.7.4) is given by 
πίη -fh + 4th + th τ ὺς + tho) 
In order to obtain exact: fit at the six relevant points, all the harmonics 
involved are to be used. The result. of retaining a smaller number of harmonics 
would give the corresponding Jeast-squares approximation over the six points. 


Once the C’s are determined, the evaluation of the right-hand member of the 
approximation 


I(x) & Σ C,T (x) 4(9θ.7.12) 


at intermediate points is conveniently effected by the method of Clenshaw 
(see Sec. 7.10). 


9.8 Economization of Polynomial Approximations 


It was seen in Sec. 7.9 that the nth-degree least-squares polynomial approx- 
imation to a function f(x) over [—1, 1], where the integral of the product of 
] NI 1 — x? and the square of the error is to be minimized, is of the form 


ΠΟ ΚΟ Fah) (xis) O84) 
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where 


a= | oe a= 2) OO a ΚΕ (9.8.2) 
π.- 1 — x? π eee : 


The approximation so determined generally will not be identified with 
that of (9.7.12) since the coefficients, determined in the one case by summation 
over a discrete set of points and in the other by integration over an interval, 
are generally unequal. However, the two approximations may be expected to be 
of similar nature, in the sense that the error associated with each will tend to 
oscillate with uniform amplitude over [—1, 1], whereas that afforded by the 
finite Legendre series arising from least-squares approximation with uniform 
weighting (Sec. 7.6) will tend to oscillate with an amplitude which increases 
toward the ends of that interval, on the average. Accordingly, if the smallness 
of the maximum error is to be the governing criterion, it may be expected that a 
satisfactory approximation may be afforded by fewer terms of a Chebyshev 
series than would be required for a Legendre series. 

Still another method of obtaining an approximation to a function f(x) 
as a linear combination of Chebyshev polynomials is due to Lanczos. Its use 
presumes that a polynomial approximation to f(x) is already available but that a 
more efficient one is desired. Specifically, it is supposed that one has the relation 


= > Ax* + E(x) (9.8.3) 


where it is known that 
IE(x|<e, (-lsxsI) (9.8.4) 


and that ¢, is smaller than a prescribed error tolerance 8, whereas |A,| + δι 
is not a tolerable error, so that the last term in the approximation 


I(x) & Σ A,x* (9.8.5) 
k=0 


cannot be safely neglected. 

Now let the right-hand member of (9.8.5) be expanded in a series of Cheby- 
shev polynomials. Since that member is a polynomial of degree ἡ, the resultant 
series will terminate with the term involving T,(x) and hence will be of the form 


Σ A,x* = Pa a,T;(x) (9.8.6) 


k=0 


From the fact that the terms of highest degree in T,(x) are given by 


T(x) = 2773 (x Ἐξ =x? x a .«Σ (98.7) 
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it follows that the result of expressing the two members of (9.8.6) in terms of 
decreasing powers of x will be of the form 
A,x" Α,,. χ᾽ + Agigx"™* +++ = 25 ἴα, (x —— x" 7 τ. ᾽ 


n— Ii 
4 


es 25 τὰ 6 -Ν χ 9 t-. ) ἘΞ iad | Sem a se -+) foes 


(9.8.8) 


so that there must follow 
LEO Ay 


eA (9.8.9) 
fiends oad Coe 4 14) 


and so forth. 

Thus, if ἡ is sufficiently large, the coefficients of T,(x), T,_,(x),..., 
T,-—m+1(x) in (9.8.6) will be small relative to the respective coefficients of 
x", x™~*,..., x"-™*1 in (9.8.3), for some m, and it may happen that 


(la,—m+1| + l@,—m-+21 te oe Ια,) + δι 


is smaller than 8 and hence is a tolerable error in the desired approximation to 
f(x). Since |T;(x)| < 1 on [—1, 1], the last m terms in the right-hand member 
of (9.8.6) are then negligible, and the approximation (9.8.5) can then be replaced 
by 


—m 


T(x) & δι a,T,(x) (9.8.10) 
where m > 0, after which this approximation can be transformed back to an 
expression of the form 


70) 5 oy A,x* (9.8.11) 
k=0 


if this is desirable. In this way we obtain a polynomial approximation to f(x) 
over [—1, 1], involving fewer terms than would be required by the original 
approximation and tending to involve the smallest possible number of poly- 
nomial terms which will supply an accuracy within the prescribed tolerance 
limits. 


t It should be noticed that, in this procedure, the error Ἐ,(χ) in (9.8.3) is accepted as 
a fixed error and an efficient approximation to f(x) — E,(x) is sought. Thus the 
approximation obtained is generally not the best possible one but would differ little 
from it if |E,(x)| were small relative to ε. 
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The transformations involved are facilitated by the use of the two follow- 
ing sets of relations, the second set being taken from the results of Sec. 7.9, 
and the first set being obtained by successively inverting the members of the 
second set: 


1 = To To = 1 
x= Τ᾽ T; = XxX 
x? = $(To + T>) T, ΞΞ 2x? es 1 
x° = 437, + Ts) T,; = 4x° — 3x 
x* = 161, + 41, + T,) T, = 8x* — 8x2 +1 
x5 = «Κ(ΟΤ, + 57, + Ts) Ts = 16x> — 20x? + 5x 
χϑ ΞΞ 35(10T, + 15T, T¢ = 32x° = 48x* (9.8.12) 
+ 6T, + Te) + 18x? — 1 
x’ = 3357, + 217 T, = 64x’ — 112x° 
5 5 TT; a T;) = 56x° — 7x 
Xe = thy(35Ty + 567, T, = 128x® — 256x° + 160x4 
+ 28T, + ST; + Ts) = 32x? + 1 
χϑ = s4,(126T, + 847; Ty = 256χϑ — 576x7 + 432x° 
+ 36T Ἔ OT. + To) — 120x3 + 9x 


In illustration, suppose that a polynomial approximation to e* is required 
over [—1, 1], with a tolerance of 0.01. The truncation of a Maclaurin series 
gives a — approximation of degree 5 


ml tx thx? + dx? + 4ιχ + phox? = γο) 9.8.13) 
with an error 


— 


< — < 0.0038 (9.8.14) 
720 


et 
—— x° 
720 
for which the neglect of the term x°/120 would admit the possibility of an error 
exceeding the prescribed tolerance. The use of the first set of relations in (9.8.12) 
transforms (9.8.13) into the equivalent form 


|E(x)| = 


y(x) = Στὸ + 217, + 43T, + {21 + τον, + are!’ (9.8.15) 
where Τὶ = T,(x). pias of the last two terms will introduce an additional 
error not exceeding 534, < 0.0058 for all x in [—1, 1]. Thus, with a total error 
smaller in magnitude than 0.0096, we have 

e* = 847 + toast Ὁ 4372 + τεῦ Τὶ (9.8.16) 
or, after using the second set in (9.8.12), 
oF ἢ «ἰ,(382 + 383x + 208x? + 68χ) (51) (9.8.17) 
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For the purpose of comparison, it may be noted that a similar manipulation 
gives the form 


yx) = qoPo + 330P1 + τς}. + 776 3 + ross a oasPs (9.8.18) 


in terms of the Legendre polynomials P, = P(x). Here only the last term 
could be neglected, so that a polynomial approximation of fourth degree would 
be required. 

The procedure described here, called the economization of power series 
by Lanczos, is useful in those situations when a minimization of the number 
of numerical operations is desirable. It clearly can be applied to any polynomial, 
whether that polynomial is obtained by truncating a power series or otherwise, 
once the interval of interest has been transformed to the interval [—1, 1]. An 
obvious alternative method consists of a stepwise elimination of powers, first 
eliminating x", then x"~1, and so forth, until the total error is about to exceed 
the error tolerance. Other modifications and systematizations appear in the 
literature. 


9.9 Uniform (Minimax) Polynomial Approximation 


The problem of determining the approximation y*(x) of specified type to a given 
function f(x) over a finite interval [a, δ], such that the maximum error over 
[a, Ὁ] is minimized 


max |f(x) — y*(x)| = min (9.9.1) 
asxsb 
is often referred to as a minimax problem. 

We suppose here that y*(x) is to be a polynomial of degree n or less, where 
n is specified. In this case, we have seen that the procedures considered in 
Secs. 7.9, 9.7, and 9.8 provide approximations which tend to have this property 
for certain classes of functions but which generally are not optimum for a 
specific f(x). The purpose of this section is to indicate an iterative process 
which can be used to determine a sequence of successive improvements to an 
initial approximation, permitting the determination of an approximation 
arbitrarily close to the optimum one. 

The basis of this process is a theorem associated with Chebyshev (but 
apparently due to Borel), which states that when f(x) is continuous on [a, δ], 
there is one and only one polynomial y*(x) of degree not exceeding n such that 
(9.9.1) is true, and that the optimum y*(x) is uniquely characterized by what is 
often called the equal-ripple property: the deviation 5*(x) = f(x) — y*(x) takes 
on its maximum absolute value at least n + 2 times on [a, b], with alternating 
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signs at successive extrema.+ Because of this property, a minimax approximation 
also often is referred to as a uniform approximation (or as a Chebyshev approx- 
imation, to further increase the ambiguity of that term). 

The equal-ripple property of the minimax approximation suggests a crude 
iterative process in which a polynomial approximation y,(x) is first determined, 
perhaps by collocation with f(x) at a certain appropriately chosen set of n + 1 
points xo, X,,.-., X,, and the corresponding deviation 6,(x) is calculated and 
plotted over [a, b]. The points of collocation are then readjusted so that two 
points separated by a preponderant error ripple are brought a bit closer together 
while two points separated by an underdeveloped ripple are moved further 
apart, to yield a new approximation y,(x). A reasonably small number of 
successive adjustments frequently yields a satisfactory (nearly optimum) 
approximation in practice. 

For a more systematic process of successive refinement, one can make use 
of the following algorithm, which is similar to one of several suggested by 
Rémés: 


1 Choose a set of n + 2 successive points Co, C1,.--, C,41 in [4,6] and 
determine y(x) and a constant E& such that 


Fe) — VW) =(-D"E Θ.9.2 
fork = 0,1,...,n + 1. 
2 Determine a set of n + 2 points co, ci,..-, Cr+, in [α, 8] at which 
the deviation 5(x) = f(x) — y(x) has successive local extrema with 
alternating signs, including in that set the point at which the magnitude of 
6(x) is maximum on [a, δ]. 
3 Repeat steps 1 and 2 with co,..., ὦ. replaced by Coy ++ +5 Coz, and 
E replaced by a new unknown quantity £ 5 


The repetition of this process generates a sequence of approximations to f(x) 
over [a, b] which assuredly converges to the minimax approximation y*(x) 
when f(x) is continuous on [a, δ]. (See references listed in Sec. 9.19.) 

It is seen that the set of conditions (9.9.2) requires that the values taken on 
by the deviation f(x) — y(x) at the points Co, 61»... .-» Cn41 be of equal (but 
unknown) magnitude +E and of alternating sign. The set comprises n + 2 
linear algebraic equations inn + 2 unknown parameters, namely, the unknown 


+ Generally the optimum deviation has exactly n + 2 extrema on [a, δ]. Exceptional 
situations usually are predictable because of the presence of symmetry. For example, 
the best linear approximation to f(x) = x? on [—1, 1] in the minimax sense is the 
constant y*(x) = 4, which accordingly is also the best approximation of degree 0. 
Here there are three extreme deviations at —1, 0, and 1 of equal magnitude 2. 
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deviation E and n + 1 parameters specifying y(x), say, do, a1,..., a, in the 
relation 


Wx) = ag + ax +--+: + 4,x" (9.9.3) 


The initial approximation need not be determined as in step 1 but may be more 
efficiently identified with an approximation obtained otherwise, say, as in 
Sec. 9.7 or 9.8. 

Clearly the numerical determination of the local extrema of the deviation 
at each stage may present a substantial challenge. However, if the initial 
approximation to y*(x) is a good one, it may be expected that each of the new 
critical points co, ..., C4 Will be near to its predecessor. Also, except perhaps 
for the final approximation, the critical points need not be determined with high 
precision. Thus, if values of 5(x) are determined at a set of points near a preced- 
ing critical point c,, a low-order process of inverse interpolation can be used to 
approximate οὐ. For this purpose, use may be made of the fact that if 1, 012» 
and a, are near a point a where f’(~) = 0, an approximation to « is afforded 
by the formula 


a +20, +43 77α,, α2] + fla, a] 
4 4fla,, Xs ας] 


with the notation of divided differences (see Prob. 23 of Chap. 2). When 
a3; — 2 = a — a, = A, this relation becomes 


aor 


(9.9.4) 


LY αλ + -:- (9.9.5) 
2\h -- 2h +h 


where f, = f(q,). 

Care must be taken to introduce the point at which |5(x)| is maximum 
as a new critical point in each cycle whether or not that point is a perturbation | 
of a preceding one. 

In illustration, we consider the example of Sec. 9.8 in which the third- 
degree approximation (9.8.17) was obtained for the function e* over the interval 
[—1, 1], by the economization process, in the form 


γι) = ἀρ + ayx + anx? + 3x? (9.9.6) 
where 


ay = 0.9948 a, = 0.9974 a, = 0.5417 a3 = 0.1771 (9.9.7) 


when the coefficients are rounded to four places. When the deviation ὃ (x) = 
e* — y,(x) is tabulated by tenths over [—1, 1], it is found that its five extrema 


478 INTRODUCTION TO NUMERICAL ANALYSIS 


are at the two end points and near x = —0.7, x = 0, and x = 0.7. The use of 
(9.9.5) then yields the following approximate data for the extrema: 
CK 51 (cy) 
— 1.000 0.0059 
— 0.676 — 0.0047 
0.030 0.0052 
0.704 — 0.0054 
1.000 0.0073 


Thus the ripples already are of roughly equal amplitude, the principal deviation 
being at x = 1. 
The equations 


ec — y(q) =(-1IXE (ἃ = 0,1, 2, 3, 4) 


are then solved to determine the coefficients of y(x) and the constant E; 
and the results 


ay = 0.9946 αἱ = 0.9958 a, = 0.5430 a3 = 0.1794 (9.9.8) 
and 
E = 0.0055 (9.9.9) 
are obtained to four places. For the resultant modified approximation 
yo(x) = 0.9946 + 0.9958x + 0.5430x? + 0.1794x* (9.9.10) 


the approximate extrema of 5,(x) then are determined as follows: 


Ch 5 2(c;,) 

— 1,000 0.0055 

—0.677 — 0.0055 
0.048 0.0055 
0.725 ~ 0.0056 
1.000 0.0055 


Thus it appears that the first modification (9.9.10) is very nearly the desired 
third-degree minimax approximation to e* over [—1, 1]. 


9.10 Spline Approximation 


Instead of approximating a given function f(x) over an interval [a, b] by a single 
polynomial, one may divide [a@, b] into n subintervals [ἃ. re Bes eee 
[χ,.--.» 8] and approximate f(x) by a different polynomial on each subinterval. 
For example, we may recall that the repeated midpoint, trapezoidal, and 
parabolic rules for approximate integration result from the process of replacing 
the integrand by piecewise-polynomial approximations of degree 0, 1, and 2 
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(or 3), respectively, with subintervals of uniform length. In the first case the 
approximation (a step function) generally is discontinuous at each division point 
x,; in the other two cases this statement applies instead to the derivative. 

For some purposes, particularly for numerical differentiation, it is highly 
desirable that the joins of the separate arcs be as “smooth” as possible. Specific- 
ally, if it is required that in each subinterval the approximation s(x) be a poly- 
nomial of maximum degree 3, that s(x) agree with f(x) at each of the n + 1 
points 

ΧΟ = Gy X15 X25 0065 Xp, χῃ = ᾧ 


and that the first and second derivatives s’(x) and s”(x) be continuous on [a, δ], 
then s(x) is called a (cubic) spline (because of an analogy between its theory and 
the linearized theory of the so-called draftsman’s spline).t The present section 
and Secs. 9.11 to 9.13 deal with some of its properties. 

In the subinterval [x,_,, x,] between two division points, or nodes, the 
equation of the spline is readily obtained by replacing f/_, and fj, by “ἡ. , and 
δὲ in the Hermite interpolation formula of Sec. 8.2 (or by direct derivation) in the 
form ΗΝ 


s(x) = sf, (x, — JE: — Xz-1) — sf (% -- χε.) — x) 
hy. hy 


oe (x, -- x)? [2(x = Xp—1) + h, | if (x -- χες 4} [2% - x)+h,] 
πὶ hy 
(9.10.1) 

with 

h, = Xp a Χκ-- 1 (9.10.2) 
The values δι... = s’(x,-,) and δὲ, = s’(x,) are not known in advance but are 
to be determined, together with the unknown values of 5 ‘(x) at the other nodes, 
so that s”(x) is continuous at each of the interior nodes. For this purpose, we 
obtain from (9.10.1) the relations 


"Ὅι-) = = (sh. + 25) — δῖ Tf (k>0) (9.10.3) 
and 
“Ὅ,:) = --Ξ (2s, + 4) + Ot en) 9.104) 
k+1 hess 


the last expression emerging as s"(x,_, +) with k replaced by k + 1. 


ἡ Splines of higher degree, as well as spline surfaces, have also been studied (see 
references listed in Sec. 9.19). 
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The requirement that s” be continuous at each interior node x, thus leads 
to the nm — 1 linear equations 


i 1 1 1 
— Sp—1 + 2 ( + wa Sy + See 
hy hy Iga ΠῈΣ 


= gfe a Jens 4 3 feta — Se (k =1,2,...,n—1) (9.10.5) 


2 2 
hy ἤει 


relating the m + 1 unknown spline slopes so, -.-» s,. Two additional conditions 
remain to be prescribed. 

Alternatively, one can express s(x) in [x,-,, χε] in terms of yy, Voie 
fy, and f,-1 (see Prob. 35) and then deduce the n — 1 conditions 


ἢ, " ἘΞ hy, “+ Nya " ΠΝ ” 
6 


— S,_ δι. + 5 
G0 9 3 k k+1 
Sasi ate Seats (κα 1,2,...,n—-1) (9.10.6) 
Ny hy, 


relating the unknown quantities s9,..., 5”, by requiring continuity in s'(x) 
at each interior node. 

Once two appropriate auxiliary conditions (specifying, say, the values of 
s and si, or of 56 and 54) are prescribed, the solution of either (9.10.5) or (9.10.6) 
serves to determine s(x) in each subinterval of [a, δ]. 

In order to exhibit a minimal property associated with spline approxima- 
tions, we suppose that s(x) is a spline approximation to f (x) on [a, δ] satisfying 
a specified pair of auxiliary conditions. Furthermore, we consider any competing 
approximation y(x) which also agrees with f(x) at the n + 1 spline nodes x, 
(k = 0, 1,..., 7) and which also has two continuous derivatives on [a, 6]. 
Starting with the identity 


b b b b 
| { (y")? dx — | (s")? dx = [ (y" — s" dx +2 Ϊ s"(y" -- 5,γ4χ (9.10.7) 


we transform the last integral by use of integration by parts to obtain the 
relation 


b n-1 Xk +1 Xk +1 
2 | sy" — s")dx =2 > "Ὁ - "ἢ - Ϊ s"(y' -- 5) dx} 
a k=0 Xk Xk 
(9.10.8) 


A subdivision of the interval [a, b] was necessary here since s” generally is 
discontinuous at each interior node. Since s” is constant over each subinterval 
and since y = s at each node, each of the summed integrals in (9.10.8) vanishes ; 
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and since 5", y’, and s’ are continuous, the other sum telescopes to yield the 
evaluation 


b 
2 [ s"(y" — s") dx = 2fs"(y' — s'VP (9.10.9) 


Whenever s(x) is such that (9.10.9) vanishes for every member y(x) of a 
certain set of admissible competitors, the right-hand member of (9.10.7) is 
positive for any such y(x) when y(x) # s(x), and hence the same statement 
applies to the left-hand member. It then follows that 


b b 
Ϊ (s”)? dx < Ϊ (y")* dx (9.10.10) 


with equality holding only when y(x) = s(x). Thus, in such cases, the spline 
approximation is “more smooth” than any member of the relevant set of com- 
petitors in the sense that the mean-square value of its second derivative over 
[a, bj is minimal. Since the curvature y"/(1 + y’*)*/? is approximated by y” when 
the slope y’ is small, this property of the spline approximation is sometimes 
termed a minimum curvature property. However, there is no implication that the 
slope is, in fact, to be small. 

Among the situations in which (9.10.9) assuredly vanishes, so that (9.10.10) 
is valid, the following may be noted: 


] If the requirements 

s’(@)=0 sb) =0 (9.10.11) 
are selected as the auxiliary conditions completing the specification of s(x), 
then (9.10.10) is true for any admissible y(x). (A spline satisfying these 


conditions is sometimes called a “natural spline.”’) 
2 If the auxiliary conditions 


s‘(a) = f'(@) s'(b) = ΚΖ (ὁ) (9.10.12) 
are selected, then (9.10.10) is true for any admissible y(x) whose derivative 
also agrees with f’(x) at the end points of [a, δ]. 

3 If the function (x) is such that 
fb-)=fat) fb-)=f(at+) f"(b-) =f"at) (9.10.13) 
and if s(x) is required to have the same properties, so that the auxiliary 
conditions 

5 (ὃ ----}Θ = s'(at+) 5. (ὃ--} = s"(at+) (9.10.14) 


are selected, then (9.10.10) holds for any admissible competitor y(x) for 
which y’(6—) = y'(a+). The conditions (9.10.13) will be satisfied, in 
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particular, if f(x) is a periodic function of period b — a and are often 
referred to as periodicity conditions in the present context, whether or not 
f(x) is in fact periodic. The conditions (9.10.14) can be conveniently 
and equivalently replaced by the requirement that the spline domain be 
extended over an additional subinterval [x,, X,+,] of length h,., = 4, 
and that (9.10.5) or (9.10.6) hold also for k = n, with 


" 


S,, = 80 Spay = 54 or = 55 5". = 81 (9.10.14) 
The spline s(x) so defined then is called a periodic spline. 


Once a pair of auxiliary conditions has been selected, the spline approx- 
imation is to be determined by solving one of the two equation sets (9.10.5) 
and (9.10.6). When this solution is to be obtained by numerical methods, 
simplifications follow from the fact that the coefficient matrix in either case is of 
so-called tridiagonal form, with dominant diagonal elements, permitting the use 
of specially designed numerical procedures. (See, for example, Secs. 10.8 and 
10.9.) 


9.11 Splines with Uniform Spacing 
When the nodes (division points) are equally spaced so that 
hy = Χμ -- Xp = hh (Κ =1,2,...,n) (0.11.1) 


the preceding formulation simplifies. In particular, the equation set (9.10.5) 
takes the form 


6 
διε 1 t+ 48, + Sp-1 = Hof, (k =1,2,...,n —1) (9.11.2) 


where pdf, is the mean central difference 


u6f, fates (9.11.3) 
l 


and the set (9.10.6) becomes 
Shaq Ὁ 48% 4+ Sy-1 = “ 67h, (k= 1,2.....π -- 1) (9.11.4) 


with the usual central-difference notation. 
In order to obtain analytical solutions of the governing difference equations, 
we notice first that the associated homogeneous equation 


Uyp44 + 4u, + uUy,—4 = 0 (9.11.5) 
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is satisfied by u, = f* if B = --(2 + 3). Hence, if account is taken of the 
fact that 2 — /3 = 1/2 + /3), and if 2 + /3 is denoted by e* so that 


e=2+ 3 sinha= V3 cosha=2 (9.11.6) 
it can be deduced that the general solution of (9.11.5) is expressible in the form 


u, = (—1)(A sinh ke + Bsinh ka) (9.11.7) 


when k takes on integral values. 

Use then can be made of a method of variation of parameters (see, for 
example, Hildebrand [1968], Sec. 1.6), to obtain a particular solution of the 
modified equation in which a nonzero right-hand member is present, after 
which superposition yields the general solution of that equation. (See also 
Prob. 39.) 

In this way, the solution of (9.11.2) can be obtained in the form 


, sinh (n — k)a 


CAs n_, sinh ka 


+ (—1)"s, τ 
sinh na =) sinh πὰ 


A 
- --1)}σ,,μδ 
Rat (9.11.8) 
where 
sinh ra sinh (n — k)a 


: (r S k) 
sinh « sinh ne 
Geek. ees (9.11.9) 
sinh ka sinh (n — r)« P25 


IV 


sinh « sinh na 


It is seen that G,, = G,,, so that only those values of G,, for which r < k 
need to be calculated, the other values then being obtained by symmetry. In 
fact, since also G,_,,-, = G,,, the square matrix of values of G,, is symmetric 
about both diagonals. 

An additional simplification follows from the fact that sinh kw satisfies 
the recurrence formula 


sinh (k + lw = 4sinh ke — sinh (kK — Dea 
In particular, when k is an integer, it follows that sinh ke is an integral multiple 
of sinh a = ui 3. Thus, if we write 
πο a es, (OAD) 
sinha κ.(3 


we can recast (9.11.8) and (9.11.9) in the more convenient form 


n—-1 
(— ks = = 5 + (-y 5 + 2S (1G, ndf, (9.11.1 
In In h r=1 
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where 
g Gn-k 
GC, ==" rsk 
᾿ In Ve (9.11.12) 
Gry ΞΞ Gi, 
and where 


Jo = 9 g, =1 g. = 4 93 = 15 


(9.11.13) 
κει = 4σι -- θκ-ὶ 


Analogously, from (9.11.4) there follows 


n= 
(—1tsy = ὅπερ + (ay eo + SD (νοι 5% 9.11.1 
In In h? 2, 
with the same definitions of g, and G,,. 
The constants 50, s’ or sg, 5. are to be determined by the chosen auxiliary 
conditions. It is useful to notice that if n is large, and if k is neither near zero nor 
near n, it follows that 


Gn-k _ sinh (n — k)ax okt my pW 1.3} 


- [4 
In sane (9.11.15) 


Ik ae sinh Κα as ο kia ΡΨ e 1.3. τ ῦῦ 
g, sinh πὰ 


Thus the end-effect terms in (9.11.11) and (9.11.14), which depend upon the 
selection of auxiliary conditions, damp out rather rapidly toward the interior 
of [a, b] when ἢ is large. Consequently, one might conclude that the selection 
of the auxiliary conditions generally is unimportant. 

In accordance with this argument, the choice sg = 0, s, = 0 has been a 
popular one because of its simplicity and also because of the fact that then the 
mean-square value of s” over [a, δ] will indeed be minimized under ail ad- 
missible competition. 

Additional insight is afforded by a consideration of this particular 
approximation in the special case of the function 

f(x) = x? (9.11.16) 
Here it is found that} 


5" = 2 ‘ — (—1} haa = al (9.11.17) 


n 


+ In this simple case it is most convenient to attack the difference equation (9.11.4) 
directly rather than specialize the solution (9.11.14). 
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and also that 


. Ξ 24 ἘΠ -- ak ra Leone τ ad 
3 In 


wa 
° 
II 


5! = 2x, — (—1p 8 CG — Gu-1) + (Gn-n+1 — ad 
In (9.11.18) 
(0O<k<n) 


*A 


Ohh Ἐπ|} ΣῊ τος τ ae ἔφ. Ἢ | 
3 In 


Thus, at the ends and at nearby nodes, the error in s; is small only of order ἢ; 
that is, its magnitude decreases in proportion to 1/n as n is increased. The same 
is true of the error in s‘(x) at other points near the ends, while the error in s(x) 
is found to be of order h? at such points in this case. This situation is particularly 
deplorable in the present case since here, in each subinterval, we are attempting 
to approximate a second-degree polynomial by one of third degree; but it is 
also typical of the end-effect contribution to the error distributions associated 
with the auxiliary conditions so = s; = Ὁ in the general case, unless f”(x) 
also vanishes at both end points.t 

In the so-called periodic case described by (9.10.13), none of the quantities 
50» Sn» So, and s, are directly prescribed, but they are determined by the auxiliary 
conditions (9.10.14) which are consistent with the behavior of f(x) at the end 
points, so that undesirable end effects are not introduced. (See Prob. 42.) In 
other cases, this introduction can be avoided without the sacrifice of a minimum 
curvature property if use is made of the auxiliary conditions (9.10.12), so that the 
value of s’(x) at each end point is identified with the value of f’(x) at that point, 
provided that f’(a) and f’(b) are known. Alternative procedures which do not 
require these data are considered in Sec. 9.13. 


9.12 Spline Error Estimates 


In order to estimate the errors which would be associated with a spline approx- 
imation at interior points if end effects were absent, we notice first that by 
averaging (9.10.3) and (9.10.4), with A, = ἥκει = h, we obtain the relation 


WY 3 1 ᾿ 7 
δὲ = 53 Sets ἘΞ 21... Ἐ Sp) = Ἢ (δια — 5.--1) (9.12.1) 
when 1 Ξ k Sn — 1. In a similar way (see Prob. 35), a formula relating 


ἡ Incorrect error bounds, which are inconsistent with this fact, can be found in the 
literature. 
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δι. to δ... 1 and sy_, can be obtained in the form 
εξ Gani He 4 ose (OR) 
2h 12 


with the same restriction on k. 
If it is assumed that (when / is sufficiently often differentiable and ἢ 
sufficiently small) there exist representations 
se ~ Si + Ahh + Aol’ fi + °° 
se ~ St + Bibi + ΒΜ ++ 
where omitted terms corresponding to a specific truncation are of higher order 
in A than those retained, we may deduce first from the symmetry of (9.12.1) 


and (9.12.2) that the terms involving odd powers of / must all vanish. For this 
purpose we recall, for example, that 


Ji Ti __ [OX + h) — f(x, — h) 
h h 


and hence conclude that this ratio (and similarly each other ratio present) is 
unchanged when ἢ is replaced by —h. 


If now the representations 
se ~ Sy + Agh?fy + Agh*fy τ τ 
᾿ ": ° "" : a (9.12.3) 
sy ~~ S + Boh fy + Bal fe τ τ’ 


are introduced into the equal members of (9.12.1) and (9.12.2), and if then each 
term of the form f{}, is expanded as 


eg athe ea sa ces 
the resultant relations become 
" + Boh?fi’ + Bh*f yi + O(h*) 
~ fi + h?(—dy — ΖΑ) + bY(—ah0 — 34) — ΖΑ fz + OCF!) 
and 
fi + A,h?f + Agh*f’ + O(h®) 
~ fit WOODS + WY(—sb0 — ἐΒ2)χ + OCH") 


By equating coefficients of corresponding powers of h, we then obtain a set of 
equations relating the A’s and B’s, which yields 


Js i 1 πονεῖν oes ae oes 
A, = 0 8, te ee A, = Ts0 B, — 360 
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Accordingly there follows 


πο py 2 pes om) (9.124) 
κ ΛΚ 80. S19 "" 
and 
h? Ομ" 
a ee Hy ge Mee ei O h® 9.12.5 
" See a 7 (5) ( ) 


where an additional term is included in (9.12.4) for reference purposes. 

It is important to realize that these error relations merely tend to be valid 
at interior nodes as the number of subdivisions of [a, δ] increases. The portion 
of the total error which is due to end effects is not detected by the present 
methods since it dies out in proportion to e~™” = e-*@-%/" when n is large, 
and hence it is not representable by a power series in ἢ near h = 0. 

From the relations (9.12.4) and (9.12.5) we can deduce other such relations 
at the interior nodes. In particular, we note that the right-hand limit of s’’(x) 


at a node is given by 


ot 1 Ἱ uf 
s"(x,+) = - 7, (Sk — Sx) 
h* vii 67h 


h2 
Q+- ee δῆς; τ ες ΜΠ O(h® 
fe 7} 360" ΠΩΣ (1. 
whereas the left-hand limit is given by 
5 (χ —) a ig as si 
k h k k-1 
μ΄. h2 h* ... 67: _... 
ἀρ δου. A ΩΣ: cee γος fae viti 4 O(h® 
eT te Ὁ tk 3007! 60480~ * εἰ 
Hence there follows 
κπ ΠΡΌ ea Opa ee ΑΝ y + O(h*) (9.12.6) 
and 
s(x — s(x viii 4 O(h® 9.12.7 
~[3"(x4+) (x,—)] ~ Sm ἢ (ἀ) ¢ ) 


Thus the jump in s” at x, would approximate hf(x,) and the mean of the 
constant values of s” to the right and to the left of x, would approximate f’”’(x,) 
if n were sufficiently large and if end effects were not overpowering. 

Finally, in any subinterval [x,, x,+,] we may write 


F(x) — s(x) = [ὑ) — p@)] + [p@%) -- ΟῚ] (9.12.8) 
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where p(x) is conveniently identified with the cubic Hermite polynomial approx- 
imation to f(x) on that subinterval, having the property that agreement exists 
at x, and at x,,, between f(x) and p(x) and also between f’(x) and p’(x). If 
on that subinterval we write x = x, + 0h, where accordingly 0 < 0 <1, 
Eq. (8.2.18) takes the form 
h* 
F(x + 0h) = py, + 0h) ΞΞΞ ἜΞΩ: 
24 
where x, < €, « X,4,. Further, since p(x) and s(x) agree at x, and at x,44, 
we must have 


θ — Of.) Θ.12.9) 


- = 2 
p(x) — s(x) = (γί. -- 8D aaa 


) = Xe) Ort =) 9.17.10) 


— (Pest — Skat 
And since also p, = fy and pyr, = Sx+1, there follows 
p(x, + Oh) — s(x, + Oh) 
= ho — 0)°(fi — 50 — κθ΄ -- θκει — Seas) 0Θ.2.11) 


Finally, since (9.12.4) indicates that at the interior nodes the deviation 
f' — 5' is asymptotically small of order h*, it follows that the spline approxima- 
tion tends to differ from the cubic Hermite approximation only by terms which 
are small of order h° in interior subintervals. Consequently, we have also 


f(x) — s(x) ~ * or ~ 6)f(x,) + O(h®) (9.12.12) 


in any interior subinterval [x,, x,4,], where θ = (x — x,)/h. It can be verified 
that the factor 07(1 — 6)? does not exceed τε when 0 < 6 Ξ 1. 

We note again that the symbol ~ is used in the present context to mean 
fin the case of (9.12.12)] that the error f(x) — s(x) at a point in an interior 
subinterval differs from (h*/24)0?(1 — 6)?/'"(x,) by terms which depend upon 
the choice of the two auxiliary conditions but which decrease with h = (ὁ — a)/n 
in proportion to e~“", where c = a(b — a) “ 1.3(6 — a), and by terms which 
decrease at least as rapidly as a multiple of ἢ as h — 0. 


9.13 A Special Class of Splines 


Clearly, it is desirable to select auxiliary conditions which will ensure that the 
actual spline errors at the end points x) = a and x, = ὃ tend to zero ash — 0 
(and n — 00) at about the same rate as those predicted by the preceding asymp- 
totic error formulas at interior points. In particular, in order to deal with situa- 
tions in which the use of the ideal conditions so = 70 and s, = 2715 not possible 
because f’(a) and f’(b) are not known, or is otherwise undesirable, we are led 
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to the desirability of making 50 — fj and s, — f; small of order h* for con- 
sistency with (9.12.4). 

Reference to (3.11.5) and (3.11.7) shows that one method of accomplish- 
ing this end (with a minor invalidation of the minimum curvature property and 
under the assumption that /"(x) is continuous on [a — ἢ, ὃ + h]) consists of 
introducing the value of f at each of two additional nodes x_, = a — h and 
Xn+1 = 5 + A outside the original interval [a, ὁ] and imposing the auxiliary 
conditions 


1 
80 = ph Ὑτὶ — 10fo + 18f; — 6f, + 23) (9.13.1) 
and 
ve (feos + Of,-2 — 18f,-4 + 10f, + 3s.) (9.13.2) 
There then follows 


4 


Ἔν, ee | 
Ifo — Sol S ΠΣ: lfn — Spl S 50 8 (9.13.3) 


where M; is a bound on |f“(x)| in[a — h, b + h]. Since also (9.10.4) then gives 


sb 


-= si + si) + ΕΣ σι — fo) 


-= fi er ae Sh ~ fr) 


4g ; Dds ; 
Ἐπ Uo — 5) + τὶ — 5) 
he 3 ἃ 
Ξ oto Oe) 


and a similar result is obtained for 57, the errors in 5" at the ends also will be 
small of the same order in ὦ as the asymptotic interior errors predicted by 
(9.12.5). 

Thus the end errors, the effects of which in any case damp out rather 
rapidly toward the interior of the interval [a, b], are indeed of the same order of 
magnitude as the internal errors which would be present in their absence. This 
fact appears to justify the inconvenience associated with the introduction of the 
two supplementary ordinates when their values are available.} As the spacing is 


t Another procedure, due to Curtis [1970], tends to accomplish about the same 
purpose by introducing additional interior nodes at the midpoints of the subintervals 
[Xo, x1], [%1, X2], [Χη-- 2» Xn—1], and [χη.-.1, xn). Although a subsequent halving of h 
would require the introduction of these points as new basic nodes in any case, the 
fact that the n + 4 subintervals used at each stage are not of uniform length (when 
n> 4) complicates the formulation and adversely affects its accuracy. Such a 
modification would be particularly appropriate, however, if f(x) were undefined 
outside [a, δ] or if nonuniform spacings were to be used in any case. 
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refined, perhaps by successive doubling of , to yield a sequence of approx- 
imations, the two supplementary abscissas approach the end points of the 
interval [a, δ]. 

With these (or equally appropriate) auxiliary conditions satisfied, spline 
approximation is particularly useful for high-precision numerical differentiation 
when sufficiently accurate data are available, since the error f i — δι. decreases 
at least in proportion to h* as the spacing h is decreased (so that a halving of h 
tends to divide the error in s; by a factor of 16) when f"(x) is continuous on 
[a — h,b + h]. In fact, in this case the approximations to f "(x,) and 2") 
afforded by s”(x,) and by 4[s”(x,+) + s’(x;,—)] are small of order h?, and the 
approximation to f V(x,) afforded by h7*[s’"(x,+) — 5.0 --}} is (surprisingly) 
small of order h* if f(x) is continuous. 

These convergence properties are in sharp contrast with those associated 
with the derivatives of the polynomial y(x) of degree n which agrees with f(x) 
at all the n + 1 nodes a, x;,..., X,—1, 0, aS ἢ increases, since then even the 
interpolation error f(x) — y(x) may not tend to zero on [a,b] asn > © even 
though f(x) has derivatives of all orders in that interval (Sec. 4.11). 

However, the importance of suitably controlling the effects of roundoff 
errors (emphasized in Sec. 3.8) continues to be essential. Although the mag- 
nitudes of the coefficients of the individual ordinates in (9.11.11) are found to 
grow less rapidly as n increases than do the magnitudes of the corresponding 
coefficients in the sequences of formulas referred to above, still the necessity of 
strong sign variation in each formula remains and is inherently troublesome. 
In particular, if the data are empirical, smoothing of some sort is imperative 
before use is made of any formula for numerical differentiation and still a high 
degree of accuracy rarely can be anticipated. 

The sum Κι’) of the magnitudes of the coefficients of the ordinates in 
(9.11.11) would be of the form 


K® = + bn + σι. 1) + θ|κ-2ι| + θιίθι-κ + amend (9.13.4) 
h In 
when 1 < k <n — 1ifsg and 5s, were taken to be zero. The imposition of the 
conditions (9.13.1) and (9.13.2) slightly reduces the coefficient of h~* in K (n) 
when 1 < k <n — 1 by an amount which rapidly tends to zero toward the 
interior of the interval [a, b] and with increasing n. Thus, if all ordinates could 
possess errors of magnitude δ, the maximum possible associated error in δὲ 
would be slightly smaller than K (")e It is found further that the coefficient of 


μ΄ 1 in K does not vary strongly with either Καὶ or 7 and, in addition, that 
K® < > (16 — 9/3) < = (9.13.5) 


for alln > 3 andallk such that] Sk Sn — 1. 
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When (9.13.1) and (9.13.2) are satisfied, the asymptotic relation (9.12.4) 
for the truncation error in f;, — δι. can be replaced by the more useful estimate 


fi —- 5 @ 11 + ᾿͵|:-1}}}} 10 Jane + (- 59.) ἢ Bg (9.13.6) 
In 180 


for 0 Ξ k Ξ n, which incorporates the fact that end errors now known to be 
approximated by —;h*fy and — hf", together with their propagated 
effects, are superimposed on the errors considered in the derivation of (9.12.4). 

When n = 2, the value of s’(x) at the one interior node can be expressed 
in the form 


= = — 8fo + δή, — fz) (9.13.7) 


when (9.13.1) and (9.13.2) are to hold at the end points; and the corresponding 
formulas when n = 4 are found to be 


2 ss ——_ (A5f_, — 2900 — 125; + 5887) — 1417, + 26f, — 3f;) 
ee ΩΝ 3.0.4 Ὁ 2600 -- 1270, + 1270. — 260, + 3.Ὁ (9.13.8) 
ΓΞ τ, ——_ (Ὁ, - 2600 + 141f, -- 5587, + 125f, 390, -- 45) 


Finally, the formula for the midpoint derivative whenn = 8 is listed for reference 
purposes: 


1 
86 = Saya (—3f-1 + 2600 — 1266. + 4987), — 1871/5 


+ 1871f; — 498/; + 126f, — 26f; + 329) (9.13.9) 


Corresponding formulas relating instead to the imposition of the auxiliary 
conditions so = s,; = 0 are considered in Prob. 47. In accordance with preced- 
ing analyses, the degree of precision for the formulas (9.13.7) to (9.13.9) is 4 
(that is, these formulas would yield exact results if f(x) were a polynomial of 
degree 4 or less). On the other hand, the degree of precision for the formulas 
corresponding to the auxiliary conditions δ = s” = 0 is only 1 for off-center 
formulas and 2 for the midpoint formulas. (As has been pointed out before, 
however, the degree of precision of a formula is not necessarily a good measure of 
its usefulness.) 
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Some rounded numerical results obtained by use of only two- and four- 
division spline approximations from seven-digit values of the function f(x) = 
x*/3 over the interval [1.0, 1.8] are listed in the following table: 


s(x) —s’(x) 
x n=2 n= tf’) n= n=4 —f"(x) 
1.0 0.33501 0.33349 0.33333 0.2267 0.2222 0.2222 
1.2 0.29512 0.29518 0.1615 0.1640 
1.4 0.26549 0.26636 0.26635 0.1209 0.1261 0.1268 
1.6 0.24365 0.24367 0.1010 0.1015 
1.8 0.22632 0.22530 0.22527 0.0750 0.0825 0.0834 
ξσκε + 5κ-) -αμρόκι -- 5σκ.) 

x n= n= 4 tf”) n=2 n= —fir(x) 

1.0 0.370 0.988 

1.2 0.240 0.228 0.632 0.506 

1.4 0.190 0.151 0.151 0.375 0.260 0.288 

1.6 0.109 0.106 0.163 0.176 

1.8 0.007 0.114 


Whereas the asymptotic error estimates would suggest the propriety of 
iterated Richardson extrapolation (as in the Romberg integration process), this 
procedure usually is vitiated by the end effects still remaining, except near the 
midpoint of the interval when n is reasonably large. 

When approximate values of f”(x,) are required, in place of using values of 
s;, as above it is usually preferable to effect spline differentiation on the calculated 
values of s’(x) at the nodes. (This process has been called spline-on-spline 
computation.) For this purpose, Eqs. (3.11.4) and (3.11.8) can be rewritten in 
the forms 


1 
5.1 = jp ΕΞ + ἀ4δ -- 36f, + 16f, -- 323) 
(9.13.10) 


᾿ 1 

δηε1 = Ἵν — 16f,-2 + 36f,-1 -- 487, + 257...) 

to supply the necessary additional data for the satisfaction of the auxiliary 
conditions. The results of this procedure in the present example are compared 
with the s” values and with the rounded true values in the following table, and 
are seen to exhibit substantial improvement (except at x = 1.0): 


Spline 

on 
x —s”(x) spline — f(x) 
1.0 0.22220 0.22310 0.22222 
1.2 0.16149 0.16431 0.16399 
1.4 0.12607 0.12654 0.12684 
1.6 0.10104 0.10152 0.10153 
1.8 0.08252 0.08337 0.08343 
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Formulas for numerical integration also can be obtained by integrating 
spline approximations. In particular, from (9.10.1) there follows 


Xk+1 2 
| s(x) ἀκ = Me (ας + fs.) - ME (44 τ sf) (13.11) 


Xk 
Thus, when the spacing of the nodes is uniform we obtain, by summation, the 
simple result 


b 2 
[ὦ -ιαλ + at Sato t haat hor τ - © οἱ - 8) 


(9.13.12) 
In particular, for a periodic spline this formula reduces to the trapezoidal rule. 
When the spline satisfies the conditions 50 = [0 = f’(a), 5, = f, = f'), 
Eq. (9.13.12) is the Euler-Maclaurin formula with one correction term. 
Otherwise, if the end conditions (9.13.1) and (9.13.2) are used to define 
s(x), the corresponding approximations to the integral in the cases n = 2, 3, 
and 4 are obtained as 


[τ ἀχ "«ἵ;ὶ ofa + 280 + 907, + 28f, — - 2) (9.13.13) 
[ 70 dx = τ (--, + 2100 + 52f, + 52f, + 21f, -— f,) (9.13.14) 


b 
| I(x) dx πα a (—3f_, + 62fo + 1630, + 132f, + 163f; + 62f, — 3fs) 
᾿ 144 
(9.13.15) 
where the error in each case differs from the Euler-Maclaurin error [see (5.8.14)] 
with m = 1 by A° terms if f” is continuous in [a — ἢ, b + A]. In fact, it can 
be shown that the error is of the form 


B= 2= 2a) prey 4 prey] 0.13.16) 
720 a. 1 = 
for any value of n, where h = (b — a)/n as usual, and where €, and €, are in 
(a —h,b + h). 
Many of the corresponding approximations generated by integrating 


splines for which 50 = s;, = 0 may be found in the literature. When n = 2, 3, 
and 4, the formulas are 


δ 

| I(x) dx & = + 10f; + 32) (9.13.17) 
b 

| Our: = fe + If, + 11, + 4) (9.13.18) 


b 
| ΓΤ τε ΑἸ Ὁ 320, + 26, + 327, + 117.) (9.13.19) 
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the simplicity of the forms being somewhat offset by the fact that the degree of 
precision of each such formula is only 1. 

In the case of the preceding numerical example, the approximations to the 
integral 


1.8 
x2/3 dx + 0.8921945 
1 


«Ὁ 


afforded by (9.13.13) and (9.13.15) round to 

0.8922128 and  0.8921956 
while those given by (9.13.17) and (9.13.19) round to 

0.8918105 and 0.8921376 


Here the Richardson extrapolation is indicated. The first pair yields the value 
0.8921945, while the second pair yields 0.8921594. 


9.14 Approximation by Continued Fractions 


Newton’s divided-difference polynomial interpolation formula (2.5.2), with an 
error term, can be considered as the identity which results from writing 


f(x) = ux) 0.14.1) 


and effecting the successive substitutions 
U(x) = μμίχρ + ( — Xp)Un +1) (kK =0,1,...,2—-1) (9.14.2) | 


with the abbreviation 
u(x) = f[X0, X15-- +> Χκ-.» Χ] (9.14.3) 


Thus, for example, when 7 = 3 there follows 


F(X) = Uo(Xo) + (% — χολίω,(χι) +  — X1)[u2%2) + & - X2)u3(x)]} 
= Ug(Xo) + (x — χολμιίχῳ) + (α — χρ)α — χι)μχ(χ2) + EO) 
= f[xo] + (« — xo)f Lo *1] | 
+ (x -- χρὰ — %1)S[%0. X1, X2] + Ex) (9.14.4) 
where 


E(x) = (x — Xo\(x -- χα) — X2)U3(x) 
= (x — Xo (x — X4)(X — X2)f[%0. χι» %2, x] (θ.14.5) 
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The algorithm for the calculation of the successive divided differences 
follows directly from (9.14.2) and (9.14.3), with x = x,, in the form 


ΠΧ ioc Seas hes Xx] 
. 7χο»---- Xe— 2» χε] -- 97[χο»---. Χκ- 2. χκ- 1] (9.14.6) 


Xx — Xk-1 


The result of assuming that the (n + 1)st divided difference wu, , ,(x) is identically 
zero (or that the nth divided difference is constant) is the equation of the poly- 
nomial y(x), of degree n or less, which agrees with f(x) at the n + 1 points 
Χορ» X15-++5 χη. If u,41(x) actually vanishes identically, then y(x) = f(x). 

A great variety of other identities can be obtained in a similar way and 
interpreted similarly as approximation formulas by making use of other sets of 
transformations in place of (9.14.2). In particular, the substitution sequence 


f(x) = vo(x) 
Xx 


‘aos x i 
v(x) = υκχρ + ee (x) 


(k = 0,1, 2,...) 14) 


leads to an interesting and useful result. We see that the first three substitutions 
give 


Χ -ι χ x—x 
F(X) = v9(X) = υρί(χο) + - τῆς = Vo(Xo) + ἘΞ τ 
: v1(X,) + * 
v2(x) 
| = 9.14. 
= Uo(Xo) + “0 ie?) 
υ.(Χ4) + a 
a= X2 
v(x.) + 
“ue v3(x) 
More generally, we are thus led to the continued-fraction representation 
f(x) = dy + ae (9.14.9) 
ay t+ τι 
a> + Sees 
a3 + °., 


where 
a; = v,(X;,) (9.14.10) 


and where, when the fraction is terminated after n divisions, the constant a, 
is to be replaced by a, + (x — x,)/,+1(x) in the last denominator. If we then 
set x = x,, where 0 < k <n, the fraction terminates before the residual 
(x — x,)/0,+1(X) is introduced. Thus, since (9.14.9) is an identity, the result 
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of replacing 1/v,,,(x) by zero (that is, terminating the fraction with a,) will 
give a function r,(x) which agrees with f(x) at the n + 1 points Xo,..., x, 
under the assumptions that the constants ap,..., @, are actually existent and 
that the portion of the truncated fraction inferior to x — x, does not vanish 
when x = x,, for k = 0,...,” — 1. The result of this termination may be 
called the nth convergent (or approximant) of the representation, with the 
constant a, = f(x9) considered as the zeroth convergent. 
If we introduce the notation 


v(x) = yl Xo. X15-++> Xe-1 Χ] (9.14.11) 
so that (9.14.10) becomes 


Ay, = φι[ Χο» X19 +++ Χο.» xy) (9.14.12) 
reference to (9.14.7) gives 


gol x] = f(X) 
X — Xo ΠΣ Aas ΝΣ 
φι[χο; x] = Pol x] — PolxXo] Sx) — f(Xo) 
p2LXo, X1, Χ] = ἘΞ 


φι[χο; x] -- φι[χο» χα] 


and, in general, 


Φ}} Xie Kien nt pay 


aA X — Xp~-4 (9.14.13) 
Φι- .[Χο:---» Χκ--2» Χ] — φι-«[Χο»--.» Χμ-2» Χκτα] 
Accordingly, we have also 
φ,[ Χο. sey Χχ- 1: Xi] 
X_p — Χκ- (9.14.14) 


— pe NL τ .....“«βὕψὕΨἍὉἅο,  --.-Ἔ 
φ,.-- ([Χο- ἐνὸν» Χχκ- 2» Xx] = φι-- «[Χο» ose Χχκα 2» Xp-1] 


Thus φι[χο; χα] is the inverted first divided difference of f (x), relative to 
Xo and x1, φι[χο; X1, X2] is the inverted divided difference of the inverted 
first divided difference φι[χρ, x], relative to x, and x2, ..., and, in 
general, φιῖχο»...» Χκ--2» Χκ--1» Xz] is the inverted divided difference of 
by -1[ Xo. +++ > Χκ-.2» Χ]» relative to x,-, and x,. For brevity, we will refer to the 
quantity defined by (9.14.13) as a kth inverted difference of f(x). 

Whereas the definition shows that always the inverted difference 
,[Xos «+ +> Χκ-.2.» Χκ--1» Xe] is symmetric in its last two arguments x,-, and x, 
it is not generally symmetric in its other arguments.t Thus it must be formed 


+ A related quantity, which possesses complete symmetry, is considered in Sec. 9.17. 
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from the specific inverted differences ¢,-,[X0,..-, Χκ-.2» %;-1] and 
P,-1|Xo>--->» Χκ--.2» Χκ], which possess its first k — 1 arguments in common. 
The following calculational arrangement is convenient for this purpose: 


Xo f(Xo) 

X, 74) φι[Χχο» χα} 

X2 7.2) Pilxo, x2] φαΐ χο; χι, x2] 

χα 7.3) Φι[Χο» ¥3] P2l%o, Χι» χ3] Φε[ Χο; Χι» Χ2, X3] 


:ο͵οοΨοφοφιονοοοδοοεοϑ»ο» ον κδνονδδοδο ον. δϑδεοοννονυοοδοο ὁ δὲ 9 ee oe 


Here, for example, we have 
X3 τ Xy 


$:[X0, X3] ~ φι[Χο» χα] 


The diagonal elements thus are the desired constants dp, a,, @>, @3,... Which 


appear in (9.14.9). 
In illustration, for the given data 


b2[ Xo, X1, x3 | = 


x |o 1 2 3 4 5 6 
fe) | 2 '΄ ὁ + & τῷ + 


we may number the abscissas in increasing algebraic order and, accordingly, 
form the array 


x f ¢1 $2 φ3 $a 
0 2 

a ny 

2,2 - 3 

3 4 —2 00 0 
4/8 --42 -7 τῶ -+5 
Sj -3%2 -2 - - 
eof = tb Se Ὁ 


for which the fourth inverted differences are equal. Thus, if we use only the 
first five points x» = 0, x, = 1, x, = 2, x, = 3, and x, = 4, we have 


Qo = 2,a, = —2, a, = 3, a, = 0, and a, = —5, so that (9.14.9) becomes 
fx) x24 - --- =r,(x) (9.14.15) 
—2+ — 
34 ~ —2 
ae 
—5 


where the approximation would become exact if the last denominator —5 
were replaced by the (unknown) quantity —5 + (x — 4)/¢,[0, 1, 2, 3, 4, x]. 
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Thus r,(x) may be expected to agree with f(x) at the five points employed in its 
determination. In addition, since the tabular array shows that the same approx- 
imation would be obtained if the abscissa x = 4 were replaced by either x = 5 
or x = 6, it may be expected that r,(x) will agree with f(x) at those two points 
as well. | 

Successive reductions will convert the right-hand member of (9.14.15) 
to the simpler form 


r(x) = 2% 9.14.16) 
1+x 


if this reduction is desired, and the agreement can be verified directly. Further- 
more, the respective approximating convergents corresponding to termination 
of the fraction with ao, a,, and a,, and hence to collocation at one, two, and 
three successive points, are found to be 
",)-2 τῶ τ τ 2-40 
Ὁ 7 --Ἴἰχ 

However, since the present example is exceptional to the extent that a, = 0, 
the third convergent does not agree with f(x) at the four points x = 0, 1, 2, and 
3. Indeed, this convergent is identical with r,(x), which agrees with f(x) at 
x = 0, 1, and also at x = 3, but does not do so at x = 2. 


9.15 Rational Approximations and Continued Fractions 


It is easily seen (inductively) that the mth convergent of the continued fraction 
(9.14.9) is expressible in the form 


ἄν + ax ἘΠ ΞΕ ax? 


ΠΕΣ ΤΥ ΤΟΣ OOP ONO 


r(x) = 


if n is odd, where «, # 0, and in the form 


αὐ + ax ἘΠ᾽ + ax? 


(n = 2p) (9.15.1b) 
Oot Bix τ τ... Bex? ( 


γι(Χ) = 

if n is even, where B” # 0. Thus the nth convergent affords an approximation 

to f(x) by a ratio of polynomials, that is, by a rational function of x, which 

generally agrees with f(x) at the n + 1 points Xo, x,,..., x, ifa, # 0 and if all 
preceding a’s are finite. 

This situation is in accordance with the fact that, since the numerator 

and denominator of either form of (9.15.1) can be divided through by any one 

of the nonzero constants, the first form involves 2p independent parameters 
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and the second form 2p + 1 such parameters, so that in either case n + 1 
independent constants are available for the determination of the approximation. 

Given a set of m + 1 distinct points, there cannot exist more than one 
irreducible rational function} of the form (9.15.1) which takes on prescribed 
values at those points. The proof follows simply by first writing (9.15.1) in 
the form r,(x) = M,(x)/N,(x), where M, and N,, are polynomials, and supposing 
that another such ratio, M,(x)/N,(x), takes on the same values as does r,(x) 
at n + 1 distinct points. In accordance with (9.15.1), the degrees of M, and 
M,, cannot exceed n/2 when n is even or (n + 1)/2 when ἡ is odd, whereas the 
degrees of N,, and N, cannot exceed n/2 when n is even or (n — 1)/2 when n is 
odd. It then follows that the function M,(x)N (x) — M,(x)N,,(x) also vanishes 
at those points. But, since this function is a polynomial of degree 7 or less, it 
must therefore vanish identically, so that M,(x)N,(x) = M,(x)N,(x). Under 
the assumption that r,(x) is irreducible, M,, and N, possess no common linear 
factors. Thus all linear factors of M,, must also be factors of M,, and the con- 
verse is also true since M,/N, is also assumed to be irreducible. The same 
argument applies to N,, and N,,, so that the respective numerators and denom- 
inators can differ only to the extent of a common constant multiplicative factor, 
as was to be shown. 

However, there may be no such function. For example, if we attempt 
to determine directly a function of the form 
ἀρ + 01x + Ox? 

Bo + Bix 

which takes on the values prescribed in the preceding example at the four points 
x = 0,1, 2, and 3, we must solve the simultaneous equations 


2Bo = ἀρ 
3(Bo + βι) = ἀρ + a + a, 
$(Bo + 281) = a + 2a, + 4a, 
3(Bo + 381) = ἀρ + 30, + 9.2 
which result from clearing fractions and equating the resultant members at 
the four points, and we find that the general solution is given by the relations 


r3(x) = 


ἄρ = 8a, a, = — 6a, Bo = 4a, By = —2a, 
where «, is arbitrary. Thus the assumed form becomes 
— 2 — owns 
ΠΟ ΕΞ ὃ — 6x + x _ 4— x2 — x) 
4 — 2x 2(2 — x) 


ΤΑ rational function is said to be reducible if its numerator and denominator possess 
a common polynomial factor other than a constant. 
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and is reducible to r,(x) = (4 — x)/2, in accordance with the result obtained 
from (9.14.15). The original form is indeterminate at x = 2, whereas the reduced 
form does not take on the prescribed value at that point. Thus the defect of the 
third convergent of (9.14.15) is due to the nonexistence of a form of the type 
required at that stage rather than to a failure of the determining process. 

In the case of (9.14.15), a warning was served by the fact that a, = 0. 
It should be remarked, however, that the kth convergent may be defective, 
for the same reason as above, even though a, does not vanish, although this 
situation is an unusual one. In illustration, the data 


lead to the inverted-difference array 


ΞΕ ᾿ς ἐξῇ 
-Ξ3. -ὰὶ - 
in which no diagonal element vanishes. Whereas the corresponding 
approximation 
x 12 — 13x + 3x? 
w2+ . ee απ νδ 
I) χ -- 1 6 -- 4x 


is properly exact at the four tabular points, the second convergent of the fraction 
is seen to be 
x 


x 
ro) —1 -- (x — 1) x 


and is undefined at the tabular point x = 0. It is easily verified that there exists 
no irreducible fraction of the form (ὰρ + «,x)/(Bo + B,x) which takes on the 
first three prescribed values. 

On the other hand, even though then + 1 given data serve to determine a 
rational approximation of the form (9.15.1), the continued-fraction expansion 
will fail to exist, in the form assumed, if a, = oo forsomek < n. Thus, whereas 
the data 


f(x) | 1 1 


tl 
οὐ 
εἰν 
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correspond to the function 


1+x 
x) ΞΕεεξϑε τις 
μὴ I + x? 


it is seen that $,[0, 1] = οὐ, so that there exists no expansion of the form 


f(x) = ao + - 


x-—1 
a, + —— 


a, +— 


which takes on the prescribed values when x, = 0, x, = 1, and X= 2. 
This difficulty can be averted here by reordering the abscissas in such a way that 
the equal ordinates are not consecutive. (See also Prob. 58.) Thus, if we take 
Xo = 0, x1 = 2, x2 = 1, x3 = 3, and x, = 4, we obtain the following array: 


0 1 
2 3 -- 5 
1 1 00 0 
3 2 —5 00 0 
5 17 = = ἘΝ 
ΤΣ Εἰ 
13 2 


The additional line is included to illustrate the constancy of the fourth inverted 
difference in the present case. 
From these results we deduce the approximation 


f(x) x i t+ 


—1 


which properly reduces exactly to (1 + x)/(1 + x”). In this form, the suc- 
cessive convergents are 1, (5 — x)/5, 1, (5 — x)/5, and (1 + x/(1 + x), 
As was predicted by the presence of the zeros, the second and third convergents 
are both defective, in that the second takes on the prescribed values only. at the 
first and third points while the third does so only at the first, second, and fourth 
points. If, for example, the abscissas are taken in the order 1, 2, 3, 4, 0, we 
obtain the form 


F(x) = 1 + os! 
ες Ὁ ἢ. baa! 
2 _ 6 x — 3 
5. 35. χα ἃ 
2 
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which naturally also reduces to (1 + x)/(1 + x”) but which possesses no 
defective convergents. 

In the usual cases in practice, the ordinates can be introduced in any order. 
For calculation near the beginning of a tabulation, it is usually desirable to 
number the abscissas in increasing algebraic order, whereas near the end of a 
tabulation the reverse numbering is desirable, in analogy to the Newton forward- 
and backward-difference polynomial interpolation formulas. Inside the tabular 
range it often is desirable first to introduce the abscissa nearest the abscissa of 
the interpolant, and then successively to introduce abscissas at increasing 
distance from Xp, alternately forward and backward, in analogy to the central- 
difference interpolation formulas. These choices tend to maximize the effective 
initial rate of convergence of the sequence of successive convergents in practical 
situations for which the sequence generally does not terminate and for which the 
number of convergents needed to supply a specified accuracy usually cannot be 
predicted easily in advance. 


9.16 Determination of Convergents of Continued Fractions 


Since the direct evaluation of a truncated continued fraction necessarily “begins 
at the end,” for the purpose of calculating values of a forward sequence of 
convergents or of expressing them as ratios of polynomials it is useful to 
proceed recursively by use of the formulas to be derived next. 

From the definition (9.14.7), it can be seen that f(x) is expressible as the 
ratio of two linear functions of any v,(x), say, in the form 


f(x) = R(X) + %4100)M (x) 
Six) + Vp (ΟΝ) 
In order to determine M,, N,, Κι. and S,, we may notice that f(x) consequently 
also is given by the result of replacing k by k — 1 in (9.16.1) and then using 
(9.14.7) to express v,(x) in terms of v,,,(x), so that there must follow 
R(X) + 04% 4.100)M,(x) 
δι(Χ) + 0,410) N, (x) 
oe X4)M,— s(x) + 044100) [4,My- 100) + Ry-100)] (9.16.2) 
(x — x,)Ny—10%) + τυ) [αι Ν,.- 1(%) + Sy-10%)] 
This requirement is satisfied, for arbitrary v,,,, if the desired functions 
satisfy the relations 


(9.16.1) 


My, +1(*) = 4441 M(x) + (ὁ -- xj) Mi-1) 
Ny i1(X) = ἀμ N(x) + (ὁ -- ΧΟΝ,- «() 
R(x) = (« — %)My-1@) 
Sy(x) = ὦ — x,)Na-1%) 


(9.16.3) 
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in accordance with which M,, N,, R,, and S, clearly will be polynomials in x if 
M,, Μ,, No, and N, are polynomials. Thus we may write (9.16.1) in the form 


f(x) = Mx) + (* — X4)Mi— 1(%)/ Pie iLXo. +++» Χο» X] (9.16.4) 
N(x) + (x — x,)Ni-100)/On411%03-- +> Χο» X] 
Since, when k = 0, this form must reduce to the form given by (9.14.7) 
i= Xo 
+ eo Θ ΞΘ ΥΘΝΟΜΕΙ, ΟΣ 
φι[χο; x] 


we must have N_,(x) = 0, Mo(x)/No(x) = ao, and M_,(x)/No(x) = 1. It is 
convenient to take N,(x) = 1. 7 | 
Thus M, and N, can be determined by the recurrence formulas 


f(x) = a 


My, + (X) = ἀρὰ M(x) + (α -- ΧΩ Μ,...() 


(9.16.5) 
M_,(x) = 1 M(x) = 4 
and 
Ny +100) = ἀκ NC) + (ἀ -- ΧΟ Ν,-.Ο) (9.16.6) 
Nix) =0 N(x) = 1 _ 
In particular, the Ath convergent to f(x) is given simply by 
r(x) = Mi) 9.16.7) 
N(x) 


The error associated with the approximation f(x) ~ r,(x) can be estimated by 
use of (9.16.4) if information with regard to $,4;[Xo,..., X,, X] is available, 
say, in the form of sample values of the (k + 1)th inverted differences formed 
with Xo, ..., X; as its first k + 1 arguments. For this purpose it is convenient 
to rewrite (9.16.4) in the equivalent form 


2 = — ὦ a | 
f(x) — πιο) = ro r(x) — ry-10)] .16.8) 
where 
= (x aa X4)Ni,- 1(X) 9.16.9 
ate) Py+1LXos ~~ +> Xn, XIN; (x) ' 


(See also Prob. 63.) 

When /(x) is not a rational function, the sequence of convergents generally 
is infinite, and it may or may not tend to f(x) as n > oo. However, it generally 
at least approaches f(x) more and more closely, for any fixed value of x inside 
the range of the tabular values xo, ..., Χ,» aS # increases up to a certain stage. 
The determination of successive convergents is desirable in order that the rate 
of effective convergence may be estimated. 
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It is useful to notice that the expansion (9.14.9) can be expressed in the 
alternative forms 


X — Xo 
— xX 
f(x) = ay + ἀν τὺ = a) 1 ----  ) (9.16.10) 

ay 4 X — Xj X — Xj 
ce x -- X2 1:2 Ὁ a,a 

a; + °. X= Xs 

Me gree 

1+: 


if none of the a; vanish. The more compact symbolic arrangements 


X — Xo X — Xy X — X2 


f(x) = αο + 
a, + Gp + G3 Ἐπ 
and 
Se a ge Ὁ 
la, la. la; 


of the first form are often used. 

Approximation by rational functions is often useful in the neighborhood 
of a point a at which the function f(x) becomes infinite in proportion to 
1/(x — αὐ", where m is a positive integer whose value may or may not be 
known.} In illustration, the following calculation is for the purpose of 
determining an approximation to cot 0.15 (= 6.6166) from the given three- 
place values. 


ΚΊ x, Si ¢1 $2 φ3 x —X,_x| M, Nx ry 

0] 0.1 | 9,967 0.05 9.967 1 9,967 
1); 0.2 | 4.933 —0.019865 —0.05 | —0.14799 | —0.019865 | 7.450 
2) 0.3 | 3.233 —0.029700 —10.168 —0.15 1.00641 0.15199 6.622 


3| 0.4 | 2.365 —0.039463 —10.205 —2.70| —0.25 | —2.69511 | —0.40739 6.6156 
Here, for example, we have from (9.16.5) and (9.16.6) the computations . 

M, = (—10.168)(—0.14799) + (—0.05)(9.967) = 1.00641 

Nz = (—2.70)(0.15199) + (—0.15)(—0.019865) = —0.40739 


In the calculation of the successive inverted differences, about one more 
digit was retained in each inverted difference than would be expected to be 
significant if all digits retained in the two preceding entries, from which it is 


+ The application of polynomial approximation to 1/f(x) or to (x — a)"f(x) may be 
appropriate alternatives. | | 
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calculated, were correct, when account is taken of the loss of significant figures 
in the subtractions involved. The tabulated a’s are then treated as though 
they were exact in the calculation of the M’s and N’s, so that, for example, 
(at least) five digits are retained in N;, even though not more than three of its 
digits would be significant if the value — 2.70 were correct to the places given, 
and only to those places. 

Because of the fact that errors in the given ordinates and in the a’s, 
M’s, and N’s enter into the determination of the required r’s in a nonlinear 
way, it is difficult to estimate in advance the number of digits which should be 
retained in each intermediate calculation, but it is usually desirable to retain at 
least as many digits as are required by the preceding rule.+ In the present case, 
the tabulated value of a; would be modified in its third digit if additional digits 
were retained in the calculation of preceding divided differences, and this 
modification would change M, and N, in the third digit. However, the cal- 
culated value of the ratio r; = M,/N, would be modified by only two units in 
its fifth digit. The deviation of the calculated value r, from the true value, by 
one unit in its fourth digit, is due principally to the roundoff errors in the given 
data (see Prob. 65). 

The fact that the value of the convergent r; itself is not sensitive to ap- 
preciable errors in a; can be seen more directly by inspection of the actual 
truncated continued fraction: 


— 0.05 


—0.15 


—0.019865 + ————___—___ 
— 10.168 + —— 
—2.70 


In this particular example, the near linearity of the first inverted difference 
¢,[0.1, x] suggests the use of polynomial extrapolation over the three available 
values for the determination of $,[0.1, 0.15]. The use of Newton’s forward- 
difference formula (retaining the second difference) gives φ.[0.1, 0.15] “ 
0.014923, so that there follows 


0.15 — 0.10 


~~ x 0.014923 κ[(0.15) = 6.6165 
f(0.15) — 9.967 


and the calculation of inverted differences of higher order is avoided. (See also 
Prob. 59.) 


} In addition, it should be noted that possible rapid growth of M,, and Ν in special 
situations may occasion overflow in machine calculation, unless proper precautions 
are taken. 
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It can be shown (see Prob. 61) that the kth convergent of (9.16.10) can be 
expressed in the form 
ry = Ay + xX — Xo _ (x — χρ)ίχ — x1) re (x — χο)α — χι)α — X2) ee 
ΝΟΝ, Ν.Ν; N,N3 
+ (—1}*! (x — χολίχ — χα)" (Ἃα — Xx-1) (9.16.11) 
ΝΟ ΝᾺ 
In the case of the preceding example, the kth convergent r, is thus obtained 
by retaining k + 1 terms in the sum 


9.967 — 995... _0.0025_ __ 0.000375 


0.019865  0.0030193 0.06192 


and successive results agree, to three decimal places, with the values obtained 
previously. 

If a specific rational approximation r,(x) is to be used repeatedly for 
numerical calculation, it is significant that usually neither the continued-fraction 
form (9.16.10) nor the form (9.16.7) [or (9.15.1)] provides the most efficient 
specification of the approximation in terms of the number of necessary “long 
operations” (multiplications and divisions). The evaluation of the mth conver- 
gent of (9.16.10) involves πὶ divisions, whereas the most efficient evaluation of the 
ratio of polynomials in (9.15.1a) or (9.15.15) generally requires n — 1 multi- 
plications and one division [when one coefficient is reduced to unity in advance 
by preliminary division, and when the polynomials are evaluated recursively 
by Horner’s method, as in (1.8.6)]. 

On the other hand, except in certain special cases which require individual 
treatment (see Probs. 69 and 70), it is possible to transform the expression for 
r,(x) to the form 
Al, Αι gg Apel 


r(x) = Apx + Bo + 9.16.12 
() : : Ix + B, Ix + B, Ix + Β,-ἴ ( ) 
when n = 2p — 1, and 

r(x) = Bo + 44 Ἔ _Aal ἜΠ1)Ὦ ἰδ ἃς (9.16.13) 


Ix +B, |x τ 8, Ix + B, 
when n = 2p. The second form involves only n/2 divisions, the first form 


(n — 1)/2 divisions and one multiplication. 


9.17 Thiele’s Continued-Fraction Approximations 


Whereas the kth inverted difference $,[Xo,---» X~—25 Xx-1» χε] Of a function 
f(x) is symmetric only in its last two arguments, it happens that the quantity 
PrlXo>-++> Xe] = φι[Χο»-- +> Xe) + φι.--2[Χο:---ν» Xx-2] 

1c ob, 4[Χο- se ὡς Ὁ Xa al 59 (9.17.1) 
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is symmetric in all its k + 1 arguments. Here the last term on the right is 
Pol Xo] if k is even, and is ¢,[xo, x,] if k is odd. This quantity is often known 
as a kth reciprocal difference of f(x). 
In particular, we have 
PolXo] = φο[χο] = Χο) 
ΧΙ —-x 
Pilxo, X1] = $4[Xo, x1] : ° 


en — τς ς (9.17.2) 
I(x1) — £(Xo) 


and calculation shows that 


PalXo, X1, X2] = φοίχο] + b2[%0, 4, X2] 


= Xofolhi -- 2) + mahi — fo) + x2hil/o = Ji) (9.17.3) 
χρίδι — fo) + X1(a — fo) + χοῦ 4) 


in which cases the symmetry is apparent. Although an inductive generalization 
is possible, the following argument is considerably more simple. 

It is easily verified that when use is made of (9.16.5) and (9.16.6), the nth 
convergent of (9.14.9) is given by (9.15.12) when n = 2p — 1, with a, = 1 
and fp-1 = Pop-i[Xo,---, X2p-1], and by (9.15.16) when n = 2p, with 
ἂρ = PrplXo,---, X2p] and 6, = 1. Thus it follows that P2p is the ratio of the 
leading coefficients in the numerator and denominator of the rational function 
of form (9.15.16) which agrees with f(x) at the 2p + 1 points xo,..., Xap» 
whereas p2,_; is the reciprocal of that ratio for the rational function of form 
(9.15.1a) which agrees with f(x) at the 2p points X0>+-++5 Χχρ--1. Clearly these 
ratios are independent of any ordering of the points involved. 

Since (9.17.1) implies that 


Pil Xo. see x; | Pr-2[ Xo, tg X,-2] = P,[ Xo, ary x; | (9.17.4) 


reference to (9.14.13) shows that the successive reciprocal differences may be 
obtained by use of the recurrence formula 


PiLXo, so 09 Xe—25 Xp-1, Xx] 
ἘΞ Xp στ Χρ- 
βι--τἰ Χο: κεν) Χχ- 29 X;,| ia ρι-- αἰ Χο» s+ +5 Χκ- 2. χρ- 4] 


+ ρι--2[Χρ»...» Χκ.2] (0.17.5) 


Although this formula is less simply applied than (9.14.13), the symmetry 
of the Ath reciprocal difference permits its calculation from any two (k — 1)th 
reciprocal differences having k — 1 of its arguments in common, together with 
the (kK — 2)th reciprocal difference formed with those arguments. 
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Thus, in particular, a reciprocal-difference table may be constructed in the 
convenient form 


Xo S(Xo) 
X41 

ρῳ[Χι- X2 | 
τι — P1[X2, x3 | 7 


P2l Xo; X41, X2] 


[x1, X25 x3] ρ3[Χο; X15 X2; x3 | 


From this table we may determine the coefficients in (9.14.9) by combining 
(9.14.12) and (9.17.4), so that 


ay = f (Xo) a, = pi[Xo, x1] 
Ay = P2[Xo, X4, χα] — f(Xo) a3, = ρ53[Χο: Χι; Χγ, x3] — p3[Xo, χα] 


and so forth. Thus the required coefficients are formed from (but are not 
identical with) reciprocal differences appearing in the forward diagonal beginning 
with (Χο). Furthermore, because of the symmetry, the data from the same 
table are available for the determination of formulas in which the ordinates are 
introduced in other orders, by choosing difference paths made up of suitable 
contiguous diagonal segments as was done in Sec. 2.5. Each such expansion is 
identical with the one which would be obtained more simply by the use of the 
inverted-difference array corresponding to an appropriate reordering of the 
abscissas, but only one array of reciprocal differences is needed for the formation 
of the entire set. Thus the use of reciprocal differences, rather than inverse 
differences, generally is advantageous only if several such formulas are required. 
However, the definition of the reciprocal difference is particularly useful 
in the important limiting case when the abscissas x, x1, X2,... all become 
coincident, so that the requirement that the deviation between f(x) and the Ath 
convergent of the fraction vanish at k + 1 distinct points is replaced by the 
requirement that the deviation and its first k derivatives vanish at a single point 
Xo. The representation (9.14.9) then tends to the form 
Χ -- Χρ 


f(x) = a) + (9.17.6) 


X — Xo 

7 ee 
ye ΣΧ 
a, + ——* 

ας. + 


where 
a, = φιί(χο) (9.17.7) 
with the function ¢,(x) defined by the limiting process 


é(x) = lim = [Xo,.--, Xx] (9.17.8) 


Os 6 0 0 Xk OX 
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under the assumption that this limit exists for k = 0,1,.... Here, if the 
fraction is terminated after k divisions, it is necessary to replace φιί(χο) by 
Px(Xo) + (X — Xo)/bu41[X0,---5 Χο» X] in order to restore true equality. 

The consideration of this limit is complicated by the fact that the k + 1 
arguments X9,..., X, are not symmetrically involved. Thus it is desirable to 
use (9.17.4) to express (9.17.8) in the form 

P(x) = lim oe {PilXo,---> Xx] — Pr-2lxo,..--, Xx-2]} (9.17.9) 


ΧΟ." © 6 Xk 
so that both terms on the right are symmetric in their arguments. Accordingly, 
we have also 
P(X) = p(x) -- Px—2(X) (9.17.10) 
with the additional abbreviation 


PAX) = lim = pylxo,...,x,} (9.17.11) 
In addition, we have the relation 
(x) = im ——_“#-* 9. 17,1 
ἀκα Py—1[X>-- +5 Xy Xe] — Py-i[X,..-, X, x] 


from (9.17.4) and (9.17.5); and, if the limit on the right exists, it is given by 


1 k 
ee ὡς Ξ------  (ΟῚ7.13) 
ὃρ,--ιἴ Χο».... Χροα] dp,-,[x,..., x] 
OX, 1 X05 0 0 0 Xm 1 HX ax ΄ 


in consequence of the symmetry in the arguments, so that (9.17.12) becomes 
P(x) = ———~ (9.17.14) 
Px—1(X) 


Thus we may evaluate the coefficients $,(xo) appearing in (9.17.6) suc- 
cessively, by using the formulas (9.17.10) and (9.17.14) in the form 


k+1 
ρι(Χ) 


P(X) = ρι--χ(Χ) + (x) Pr+1(X) = (9.17.15) 


with the obvious starting values 


p-2(x) = p_,(x) = 0 Po(x) = f(x) (9.17.16) 


and evaluating the functions ¢,(x) at x = x9. The function p,(x) is often called 
the kth reciprocal derivative of f(x). In correspondence with the terminology 
of the preceding section, we may refer to g(x) as the kth inverse derivative 
of f(x). Naturally, Eqs. (9.16.5) and (9.16.6) and their implications [such as 
(9.16.11)] continue to hold here, with x, = x, τὸ """ = Xo: 
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In order to illustrate the calculation in a simple case, we consider the 
function f(x) = e*. By using successively the first and second relations in 
(9.17.15) with k = 0, 1, 2,..., we obtain the functions 


oo =e 
po =e φ, =e" 
pr =e” Φ) = --26" 
px = -& φ3 = —3e~* 
p3 = —2e° dy = 26" 
pa = δ" os = 56" 


and so forth. If we take xy = 0 in (9.17.6), we thus obtain the coefficients in 
the expansion 
Bis Ge Nt SM gs, PN yO ee oe ΠΝ cates 

1 529 7.53 2 1: 12 13. 12 
Inspection suggests that the inverse derivatives of e*, of even and odd orders, 
are given by 


Pax) = (--Ἠ1)}2 (vn 2 1) 
Pon+1(X) = (—1)"(2n + De™ 


and the truth of this conjecture is readily established by induction. 

The expansion (9.17.6) is attributed to Thiele. It is related to the more 
general expansions considered previously as the Taylor-series expansion is 
related to the divided-difference polynomial interpolation formulas. Whereas 
the nth convergent of the confluent expansion (9.17.6) generally affords a better 
approximation to f(x) in the immediate neighborhood of xo, the corresponding 
convergent of a development which yields exact results at n + 1 points of an 
interval including x, is usually to be preferred for approximation over that 
interval. 

The nth convergent of the Thiele representation is expressible in the form 
Up(x)/Vp-1(x) when n = 2p — 1, where the numerator and denominator are 
polynomials of maximum degrees p and p — 1, respectively, whereas that 
convergent can be put in the form u,(x)/v,(x) when n = 2p, in consequence of 
the results of Sec. 9.15. These ratios are sometimes denoted by R, ,-1(x) and 
R, (x), respectively. More generally, the notation R,, ,(x) is sometimes used to 
denote a rational function, with numerator of degree m or less and denominator 
of degree k or less, which agrees with f(x) at x9 and whose first m + k derivatives 
are respectively equal to the first m + k derivatives of f(x) at x9, when such a 
function exists. 


APPROXIMATIONS OF VARIOUS TYPES 511 


The approximation R,, ,(x) is often called the (γι, k) entry in the Padé 
table (Padé [1892]) of f(x) relative to x9. For a given value of the sum n = 
m + k (which is variously called the index, order, or even degree, of the approx- 
imation), it is usually true that the entry for which k = m — 1 (n odd) or 
k = m (n even) gives the most satisfactory approximation to f(x) near Xo. 
This is the entry given by the mth convergent of the Thiele representation. 

The entry R,, 9 is the sum of the first n + 1 terms of the Taylor expansion 
of f(x) about x = xo, while the entry Ro ,, is the reciprocal of the corresponding 
expansion for 1/ f(x), assuming in each case the existence of the entry in question. 
Other entries, when they exist, could be determined as convergents of confluent 
forms of more general continued-fraction representations indicated in Prob. 57. 

In addition, any Padé entry (if it exists) also can be obtained directly 
from the leading coefficients of the formal representation 


f(x) = Z, C(x — χορ)" (9.17.17) 


if it is available, by determining m + 1 a’s and k + 1 δ᾽5 in such a way that 


Σ a(x — x9)" 
ἘΞ — f(x) = O[(x -- xo)"***"] (9.11.8) 
> b,(x a Xo) 


- 
Il 
oO 


So that the left-hand side and its first m + k derivatives vanish at xp. For this 
purpose, we may write f = x — x, and clear fractions to obtain the requirement 


m k oO 
> a,t" " (Σ ba')( > ct) oar orm t kt) 
r=0 i=0 j=0 


With the conventions that a, = Ὁ unlessO < r < m,b; = Ounless0 < i < k, 
and c,; = 0 unless j = 0, this requirement can be put into the more convenient 
form 


00 k 
> [- -Σ bi, ΓΞ (9.17.19) 
r=0 i=0O y 

and hence imposes the conditions 

k 

>, b¢,-; = a, (r=0,1,...,m+k) 

=0 


with the same conventions, or, equivalently, the conditions 


k 
Zz Dic, -; = 
i=O 


a, (r = 0,1,..., m) 


(9.17.20) 
0 ( =m-+1,...,m+k) 
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with the single convention that 


οι = 0 when j < 0 


The value of the constant b, can be arbitrarily assigned (say, taken to be 1), 
after which the conditions for which r= m+ 1,...,m + k will determine 
the other k b’s, and the remaining m + 1 conditions will determine the a’s, if 
the assumed entry exists. 

The expansion (9.17.6) can be generalized usefully as follows. If we 
replace the independent variable x by G(x) and write 


S(G(x)) = F(x) p,(G(x)) = P(x) P(G(x)) = ®,(x) 
the formulas of (9.17.15) become 
G'(x) 
Pi(x) 


P(x) = Py-2(x) + Ox) Oi) = (K+ 1) (9.17.21) 


with the starting values 
P_.(x) = P_x(xs) = 0 3 @o(x) = F(X) = (9.17.22) 
and (9.17.6) takes the form [compare the Biirmann-series expansion (1.9.10)] 


G(x) — G(X) 
G(x) — G(X) 
G(x) — G(X) 
A, + °. 


F(x) = Ay + 
A, + 


(9.17.23) 


A, + 


where 
A, = O,(Xo) (9.17.24) 


Here, if the fraction is terminated with A,, A, is to be replaced by 


i G(x) — G(X) 
Pn+1LG(Xo), ee G(Xo), G(x)] 


if strict equality is to be preserved. 

The result of truncating (9.17.23), and neglecting the residual, is then an 
approximation to F(x) in terms of a rational function of G(x), which may be 
expected to be useful near xp. It can be used, for example, to determine approx- 
imately the value of F(x) when G(x) takes ona prescribed value, if the correspond- 
ing value of x is unknown but is approximated by Xp. The first few ®’s are 
readily found to be governed by the equations 
oo ἃ 5 234]. s4 09725) 


®,=-F Φ,. - 
. ae 3 ; F'+ Φ, 
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Thus, for example, if we take F(x) = e*, G(x) = sin x, χο = 0, we obtain 
the representation 
sin x 
1 + sin z 

—2+°°, 
near x = 0. 

In particular, if we take F(x) = x, we obtain a formula for inverse 
interpolation near x = X9. For example, suppose that we require a zero x 


of G(x) and that x9 is a previously determined approximation to x. Since 
then Ay = F(X) = Xo, (9.17.23) then reduces to the form 


¥ =X + =U) 
A, + —— ὑπο 
A; 4. = (Xo) 
A; + *, 
or to the more convenient equivalent form 
x= x= 3(Xo)| ΝΕ W(Xo)| ahs @3(Xo)| τ τὸν 2. 2 (9.17.26) 
{1 |1 [1 
where 
o4(%) = 22) ax) = Ὁ sty ΟΙ72} 


and where (9.17.21), (9.17.22), and (9.17.24) apply with F(x) = x. 
Here the relations of (9.17.25) reduce to 


®R =x Φ, -- σ’ ΓΕ Bal ee ee ΟΝ ‘++ (9,17.28) 
Ὁ’ 1 aa 2(G'/G")’ 

and there follows 

G’ G’ 26’ G’\2G’ 3G" 
The leading convergents of (9.17.26) are thus found to be 
KO = x9 xXY = x, -— Oo, K2) x, — 71 

1— OM» 
xO) = x, — οι — 3) (9.17.30) 


1 =o! Ms = M3 
where the w’s are to be evaluated at χροΐ 
+ The approximation x“ is that given by the classical Newton-Raphson procedure 


(see Sec, 10.11), the approximation * is attributed to Halley, and the investigation 
of the entire sequence appears to have been initiated by Frame. 
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In illustration, the equation x*° — x — 1 = 0 is easily seen to possess 
one real root X, which lies between x = 1 and x = 2. If we choose the crude 
approximation x, = 1, and set G(x) = x* — x — 1, there follows Gp = —1, 
Go = 2, Go = 6, Go = 6, and the successive convergents are found to be 
1,2 = 1.5, 3 = 1.29, and 15 = 1.34. If the process is iterated, starting now 
with x) = 1.34, the successive approximants round to 1.34, 1.3249, 1.324720, 
and 1.324718. The result yielded by the last approximation is in fact correct 
to more than the seven digits given. 

Whereas expressions can be derived for the error of truncation (see 
Frame [1953]), they are too complicated to be generally useful, and one usually 
must attempt to estimate the error in a given approximant by inspecting the 
behavior of the sequence of preceding approximants. 


9.18 Uniformization of Rational Approximations 


Since the nth convergent r,(x) of the Thiele representation (9.17.6) of f(x) has 
the property that r,(x) and f(x), together with their first n derivatives, agree at 
Xo, it must be true that 


F(x) — r(x) = Οἴα — χοῦ" Θ.18.1) 
as x tends to χορ. In fact, from Prob. 63 it follows that 


(x ΕΝ Xo) 
N (Xo) Nn+1(Xo) 


where omitted terms are of higher order in x — Xp. 
Also, from (9.16.6), we may deduce that here 


70.) -- mn) = (— 1)" (9.18.2) 


N, (Xo) = Pn (9.18.3) 
with the convenient abbreviation 


@,a,°°'°a ᾿Ξ 
Pr =4 ager (KS 1) (9.18.4) 
1 (k = 0) 
Thus, with this notation, it follows that 
_ nt+1 
f(x) — 7, (x) = (—1)" Co Ἐ:5. (9.18.5) 


Pn Ρη-τι 


near Xo. 

Accordingly, as was to be expected, the magnitude of the error associated 
with r,(x) is small near xo, and it tends to grow in proportion to (x — oe μμϑδδω 
with increasing distance from x9. If we suppose that the use of the approximation 
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is to be restricted to an interval [x9 — 8, Xo + 87 centered at xo, and if we write 


X = Xo + 65 jee (9.18.6) 
δ 
so that —1 < s < 1 in that interval, then we have 
gn tl 
I(x) -- τ.) = {--ξ1}" 5. 1 6... (0.18.7) 
Pn Pn+1 


and hence also 


tim LO) — ΤΑ) (1 με (9.188) 


e>0 ὃ Pn Pn+1 
with the notation of (9.18.4) and (9.18.6). This limiting relation essentially 
characterizes the local behavior of the approximation near Xa 
In analogy with preceding treatments of polynomial approximation, 
accordingly we may be led to seek a modified rational approximation r* (x) 
for which (9.18.8) is replaced by the relation 


fim £2) = Ta) _ (~ 1!" Thi) (9.18.9) 
e>0 8' Pn Pn+1 2" 

where 2. 57... 1(5) is a monic Chebyshev polynomial, since then, at least over a 
sufficiently small interval centered at x), the error will tend to oscillate with 
uniform amplitude rather than to increase steadily in magnitude toward the 
ends of the interval. 

One method of attempting this determination begins by writing r*(x) 
in the form 


Η- 


1 
Ξ + 
Opie” “Mi(x) + aoe"*? 


M(x) + 
r*(x) = Ἐπὶ — (9.18.10) 
N,Ax) + 2, Oh + 18" *N,(x) 
where the constants ἀρ, a,,...,%, are to be determined, and where the ap- 


propriateness of the powers of « is to be verified. There then follows 


f(x) — τὰ) 


ΓΝ) ΟΣ — My] +S cys re" ENOL) — My] — age! 
N,(x) + >= O48" “Ni (x) 
as (9.18.11) 
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Since (9.18.8) and (9.18.3) imply that 
lim AL Pd) — lim {Nats rat bee _ (—1)ks**1 
ἢ εξ 


a0 20 Peri 
(9.18.12) 

we then find that 

f(x) — Fe) 


lim 
εἴ +1 


e—>0O 


(—1(5"* Pues) +S (τ DOs a/Pav ds"? — a 


Pn 


- {351} nea "τ _yyetk Patt k+1 yt! 
= + > | (-1" = αι. + (--1} Pn+ 1% 
PnPn+1 c= 0 Px+1 
(9.18.13) 


Thus, if the coefficient of s* in 2~"T,,, ,(s) is denoted by ¢, (with c,4, = 1), 
2-"T,,,(s) = s"*2 + > es* (9.18.14) 
k=0 


the relation (9.18.13) is identified with the desired relation (9.18.9) by taking 


(— (ers 1 Pr 
Pn+1 


X= Ck 


or, equivalently, 
( eat 1)" +k+1 C, 


ες τοῖς. τες (918.15) 


Qp+14p42°°° Mn+ 


Hence r*(x) is determined provided only that a,, a2,..., Qi414 all differ from 
zero. 

In order to illustrate the process, we consider the function f(x) = e* 
with x) = 0, which was used as an example in the preceding section. Here we 
have 


a = 1 a, =1 a, = --2 a, = —3 
a, = 2 as — 5 
and also, by use of (9.16.5) and (9.16.6), we find that 
M,(x) = 1 M(x) =1+*x M,(x) = -2-—*x 
M,(x) = 6 + 4x + xe M,(x) = 12 + 6x + x? 
and 
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The fourth Thiele convergent, in particular, is thus 


12 + 6x + x? 


VAX a TY 
a) 12 — 6x + x? 


In order to determine r7(x), we note that since 


T;(x) = 16x° — 20x? + 5x 
there follows 


Co = 0 οι ΞΞ τς Cc, = 0 C3 = —Z C, = 0 
Thus, with n = 4, (9.18.15) gives 
% =O % =r %=O0 a=-f$ | =0 
and there follows 


εκ = [2 Ὁ 6x + x? + (67/8)(2 + x) + 04/192 
12 — 6x + x? + (67/82 — x) + 24/192 


Computation shows that the improvement in the approximation to e 
over the interval [ —¢, 6] afforded by r7 (x) is by no means limited to “small”? 
values of δ. For ὃ = 1, the maximum deviation on [—1, 1] is in fact reduced 
from about 4 x 107° to about 2 x 1075, while for ¢ = 4 the maximum 
deviation on [—4, 4] is reduced from about 7 x 107° to about 4 x 107°. 

Once such an approximation has been obtained, iterative processes (again 
based on an equal ripple theorem) exist for the purpose of generating a sequence 
of additional improvements hopefully converging to an ideal minimax approx- 
imation. As might be expected from the fact that the data enter into the approx- 
imation nonlinearly, the calculation is involved and convergence can be 
guaranteed only when the initial approximation is sufficiently close to the ideal 
one. (See references listed in Sec. 9.19.) 

A simple alternative “trial-and-error” process consists of first determining 
a rational approximation r,(x) of the desired type which agrees with F(x) at 
n + 1 selected points inside the interval of interest, by the method of Sec. 9.14 
or otherwise, and of calculating and plotting values of the deviation f(x) - 
r,(x). Then, as suggested in Sec. 9.9 for the polynomial case, the points of 
collocation may be successively adjusted for the purpose of more nearly attain- 
ing a deviation with at least nm + 2 maximum extrema of alternating signs in the 
relevant interval. 

Similar methods are applicable in the more general case when the rational 
function is to be such that the degree of the numerator does not exceed m and 
the degree of the denominator does not exceed k, where m and k are prescribed. 
This situation reduces to the preceding one, with n = m + k, when k = m 
or when k = m — 1. In the general case, the presence of at least m + k + 2 
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maximum extrema with alternating signs in the relevant minimax approximation 
is guaranteed unless that approximation happens to be such that the degree of 
its numerator is smaller than m and that also the degree of its denominator is 
smaller than k. In this rather unusual event, the guaranteed number is m + 
k + 2 reduced by the lesser of the degree defects of the numerator and 
denominator. 


9.19 Supplementary References 


For more elaborate techniques of discrete harmonic analysis, see Whittaker 
and Robinson [1944], Danielson and Lanczos [1942], Willers [1950], Pollak 
[1947, 1949], and Krylov and Kruglikova [1969]. 

An efficient method of evaluating the coefficients in a complex discrete 
Fourier representation (or of evaluating a discrete Fourier transform) was 
devised by Cooley and Tukey [1965] and has been adapted to other related 
calculations. 

The method of de Prony [1795] for determining exponential approx- 
imations is applied in Whittaker and Robinson [1944] to the numerical solution 
of certain integral equations. A related method is presented in Lanczos [1956]. 
Whittaker and Robinson [1944] also treat other methods for determining 
periodicities, giving collateral references, and Lanczos [1956] advocates a 
numerical “spectroscopic analysis’’ method for this purpose. 

Optimum formulas for interpolation and approximate integration in- 
volving equally spaced abscissas are defined and studied by Sard [1949] and 
by Meyers and Sard [1950a, 19505]. 

Approximation by Chebyshev polynomials was given its modern impetus 
by Lanczos [ 1938, 1952, 1956]. Tables of coefficients for approximating various 
functions are given by Clenshaw [1962], together with a treatment of associated 
theory and techniques, and by Clenshaw and Picken [1966] and Luke [1969]. 
See also Minnick [1957], Elliott [1964], and Fox and Parker [1968]. 

Texts dealing with the general theory of approximation and containing 
treatments of some or all of the topics considered in this chapter include 
Achieser [1956], P. J. Davis [1963], Sard [1963], Rice [1964, 1969], Cheney 
[1966], and Rivlin [1969]. Cheney, in particular, presents a 10-page section of 
bibliographic notes tracing the historical development of the principal theories 
and techniques, as well as an extensive bibliography. Classical treatments, 
available in reprint, include de la Vallée Poussin [1919] and Bernstein [1926]. 

Spline approximation in its present form apparently began with Schoenberg 
[1947]. The rapidly expanding literature in this area includes the works of 
Ahlberg, Nilson, and Walsh [1967], Greville [1969], Hayes [1970], Curtis 
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[1970], and Schoenberg [1971] among many others. Explicit forms of spline 
approximations, related to results of Sec. 9.11, are given by Schoenberg [1971]. 
The results of the error analysis in Sec. 9.12 are in accordance with corresponding 
results in Curtis [1970]. 

For general treatments of continued fractions, see Wall [ 1967] and Perron 
[1950]. Classic references include Padé [1892] and Thiele [1909]. See also 
Milne-Thomson [1951], Nérlund [1954], and Olds [1962]. 

The literature on minimax (and near minimax) approximation by poly- 
nomials or rational functions is (to quote someone) everywhere dense. The 
contributions of Maehly, Rémés, Loeb, and many others, as well as those of the 
several authors, may be found in Achiezer [1956], Cheney [1966], and most of 
the other general references cited above, together with useful bibliographies. 
See also Ralston [1965]. 

For associated compilations and techniques relative to computer approx- 
imations, see Hastings [1957], Kogbetliantz [1960], Hart et al. [1965], Hands- 
comb [1966], and Fike [1968]. 


~ PROBLEMS 


Section 9.2 


1 If f(x) = sinx when sinx = 0 and S(x) = 0 when sin x S 0, obtain the 
expansion 


1 : 2 {cos 2x cos 4x COS 6x 
f(x) =-+ 4sinx ——-(—-= 4+ be 
ans πλδ -- 1 47-1 672-1 
Also compare graphically each of the three least-squares approximations corre- 
sponding to retention of harmonics through the second, fourth, and sixth with 
the true function over [—z, z]. 
2 Obtain the expansion 


π 4/cosx _cos3x _ cos 5x 
> a en ΕΣ + ponies aes 
2 πὶ 12 3? 52 


+) (OS xz) 


Assuming the validity of this expansion, show that the series represents a triangular- 
wave function of period 2x which coincides with f(x) = |x] when x] 3 2, and 
sketch that function. Also compare graphically each of the three least-squares 
approximations corresponding to retention of harmonics through the first, third, 
and fifth, with the true function over [0, z]. 

3 Obtain the expansion 


x(x -- χ) -Ξ -- 13 er ae 53 


secs sin 3x sin 5x 
π 


| O<x<7) 
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and sketch the periodic function represented by the expansion. Also compare 
graphically each of the first three distinct least-squares approximations with 
the true function over [0, z]. 

4 Show that the squarewave function f(x), which is of period 2z and which is such 
that f(x) = —1 when —z < x « Oand f(x) = +1 when0O < x < π, possesses 
the expansion 


+ 
1 3 5 


f(x) τ 
7 


sinx  sin3x οἴη 5x 
4 τ a) 


and verify that the expansion reduces to the average of the right- and left-hand 
limits of the function at its points of discontinuity. Also compare graphically the 
first three distinct least-squares approximations with the true function over 
[—z, x]. [Let f(x) = 0 when x = 0, +7.] 

5 Show that the mean squared error associated with the approximation (9.2.3) over 
[-- π, z] is given by 


1 {1 n 
=| pide — [ab +4 > (a2 + 03] 
a ς k=1 
whereas the corresponding quantities associated with (9.2.8) and (9.2.10) over 
[0, x] are given respectively by 
π n 
Df? dx — (ἢ +4 > a) 
k=1 


and 


Also use these results to calculate the RMS errors in each of the least-squares 
approximations considered in Probs. 1 to 4. 
6 Suppose that y(x) is to be of period 27 and is to satisfy the differential equation 


αν) + By’(x) + yx) = f(x) 


where «, β, and y are constants and f(x) is a specified function of period 27. Show 
that if an approximation to y(x) is assumed in the form 


W(x) X% ag + > (Q, cos kx + δι sin kx) 
k=1 


and is introduced into the differential equation, and if the coefficients are deter- 
mined in such a way that the period integral of the square of the difference between 
the two sides of the resultant equation is as small as possible, then there follows 


yay = 2 [ f(x) dx 
2π.]. 


(ak* -- Bk* + y)a, 


at fjcoskede “SD 
nu —  Ἵ 


(ak* — Bk? + y)b, 1 | ” f(x) sin kx dx 
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Also use this result to write down a series expansion of the solution of the equation 
y'(x) + Ay) = f(x) which is of period 2π, when f(x) is the squarewave function 
defined in Prob. 4 and Δ is a constant such that A # 17, 37,..., (2k + 1)?,.... 


Section 9.3 

7 By noticing that the relevant series is geometric, show that 
N gial2 sin Na 
Seite = sin a/2 
aaa 2N (a = 2vz) 


(a # 2νπ) 


N cot ὦ sin Na (a 4 2νπ) 
> cosra = 2 
ai 2N (a = 2vz) 
and 
N 
> sin ra = sin Na 
r=—N+1 


& By taking « = mmz/N in the results of Prob. 7, where m is an integer, and writing 
x, = rn/N, show that 

N 0 (m # 2vN) 

2, 8 er = a (m = 2vN) 

and 


0 


> sin mx, 
r=—-N+1 
Then, by using the identity 2 cos jx, cos kx, = cos (j — k)x, + cos (j + k)x, 
and two similar identities, deduce the results of Eqs. (9.3.2) and (9.3.3). 
9 Use the results of Probs. 7 and 8 to show that, if the range of summation is 
changed tor = 0, 1,..., Nin (9.3.2) and (9.3.3), and if the weighting function w, 
is inserted in each summand, where 


4% (r=0) 
we = <1 Y= 1,2,...,.N—1) 
2 (r=N) 


then the right-hand members of all formulas in those equations are to be divided 
by the factor 2. 
10 Suppose that the ordinates f_yii,...,/y-1,/y are empirical, with f_y = fy, 
| and are subject to independent normal error distributions with zero means and a 
| common RMS value o. Show that the corresponding RMS errors associated with 
the coefficients calculated from (9.3.6) are given by 


(SAo)ams = (OAy)ams = —= 
J2N 


(5A,)ams = (OBJrms = —= (ἀ = 1,..., -- 1) 


VN 
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11 Suppose that the ordinates Fo, Fy,..., Fy and G,, G,,..., Gy_, are empirical 
and are subject to independent normal error distributions with zero means 
and a common RMS value oc. Use the result of Prob. 9 to show that the corre- 
sponding RMS errors associated with the coefficients of the cosine approximation 
to F(x) over [0, z], calculated from (9.3.11), are given by 


2N — 1 
(0Ao)rms = (δάν)εμς = J ON? σ 
2(N — 1 
Adams = PAS? o (kK = 1,...,N-— 1) 


whereas those associated with the coefficients of the sine approximation to G(x) 


over [0, z] are given by 
2 
(OB, pms = ν rad 


12 The following approximate values of a function f(x), known to be of period 27, are 


available: 
x -π -- 5π|6 —2n/3 — 7/2 — 2/3 — 7/6 
f(x) 2.077 0.278 — 1.014 — 0.716 0.051 0.277 
x 0 7/6 7/3 n/2 22/3 52/6 π 
70) 1.015 3.031 4.759 4.680 3.689 3.032 2.077 


Assuming first that the given values are correct to the number of places given, 
determine a trigonometric function of period 2x which agrees with f(x), to those 
places, at all tabular points. If it is known that the magnitude of the errors in all 
given values cannot exceed 0.005, and that all higher harmonics are negligible, 
determine how many of the calculated harmonics can be neglected if the total error 
is nowhere to exceed about 0.01. If, instead, it is known only that the approximate 
ordinates are subject to error distributions with an RMS value of about 0.0025, 
and if all higher harmonics are again assumed to be negligible, use the result of 
Prob. 10 to estimate the RMS errors in the calculated coefficients. 

13 Determine a seven-term cosine approximation of period 27 to the function f(x) of 
Prob. 12 over [0, z], and analyze the results as in Prob. 12. 

14 Using the following data, determine a five-term sine approximation of period 
2x to the function f(x) over [0, z], and analyze the results as in Prob. 12: 


x 0 7/6 n/3 n/2 27/3 52/6 π 


f(x) 0 1.136 0.864 4.002 6.059 2.868 0 


Section 9.4 


15 Repeat the calculations of the illustrative example using the given four-place 
values of f(x). Also determine an approximation to the limiting value of f(x) as 
x — oo and compare it with the true value. 


16 


17 


18 
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Suppose that approximate data are available for a function known to be of the 
form F(t) = Ae™, where A and ὃ are unknown constants. Show that the change of 
notation 


log F(t) = f(t) log A=c 


leads to the linear relation f(t) = c + br, after which the least-squares methods 
of Sec. 7.3, are available for the determination of c and ὃ, and hence of A and ὁ. 
Apply this method in the case when the following empirical values of F(t) are 
given, and determine whether the result is consistent with the hypothetical fact 
that the errors in the given data do not exceed 0.0002 in magnitude: 


t 9 12 15 18 21 24 27 


ἔΞΘΞΞ-.----.:-----ς-.--- -- -- gs ὁ ὁ 
F(t) 0.5820 0.4622 0.3672 0.2920 0.2320 0.1843 0.1463 


Increase each of the given ordinates in Prob. 16 by unity, and suppose that the 
resultant ordinates correspond to a function G(t). Show first that the assumption 
G(t) ~ Ae™ does not lead to an approximation consistent with the assumed error 
bounds. Then, assuming knowledge that the true function G(t) is of the form 
G(t) = Ap + A,e”, but without making use of any other information, use 
Prony’s method [with x = (¢ — 9)/3] to approximate Ao, A;, and b. 

Given the modified data of Prob. 17, assume an approximation of the form 
G(t) % Ag + 4,6} + Ae", and use Prony’s method to determine the approx- 
imation, showing that a negative value of e°? is obtained, so that the third term is 
of alternating sign at successive tabular points, and hence presumably is to be 
interpreted as “noise” in this case. (Take care to retain sufficiently many digits.) 


Section 9.5 


19 


20 


Repeat the calculations of the illustrative example, using given values of f(x) 
rounded correctly to five decimal places. 
The following data represent observed values of a certain physical quantity: 


t 0 0.05 0.10 0.15 0.20 


F(t) 0 0.954 1.527 1.502 0.913 


t 0.25 0.30 0.35 0.40 0.45 0.50 
a ee Se σον 
F(t) 0.030 — 0.752 — 1.090 — 0.833 — 0.091 0.814 


The errors in measurement are known not to exceed 0.001. Theory predicts that 
the true function F(t) should satisfy the differential equation MF’(t) + kKF(t) = 0, 
where M = 1.40 and k = 248, and hence should be of period P = 0.472. There 
is reason to believe that the difference G(t) between the true function and the 
function actually subject to measurement satisfies an equation of the form 
G’(t) + c*G(t) = 0 for some constant c. Investigate the plausibility of this 
conjecture, and approximate the period of the perturbation G(?). 
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Section 9.6 


21 Plot the function x(x) = x(x” — 2) Χ2 — 1) in [--1, 1], relevant to a five- 
point interpolation employing data at equally spaced points, together with the 
corresponding functions associated with data prescribed at the zeros of P,(x) 
and 7;(x), on a common graph. Also determine the maximum and RMS values 
of each of these functions over [—1, 1]. 

22 Use the Lagrange interpolation formula to determine three parabolic approx- 
imations to f(x) = e* over [—1, 1], such that y,(x) agrees with f(x) at x = 
—1, 0, and 1, y2(x) agrees with f(x) at the zeros of P3(x), and y3(x) agrees with 
f(x) at the zeros of 7;(x). Show also that the errors can be expressed in the forms 
x(x? — 1)e*t/6, x(x? — 3)e*2/6, and x(x? — 3)e®3/6, respectively, where each ἔ 
is in [—1,1]. Calculate the actual errors in the three approximations for 
x = —1.0(0.2)1.0, plot them on a common graph, and compare them with respect 
to approximate maximum and RMS values. 


Section 9.7 
23 Derive (9.7.8) by first obtaining the intermediate results 


eifnt 1)a sin (n + 1)α 


Σ eilart Ia _ sin « (x 7 vn) 

ee (—1)"(n + 1) (a = vr) 
and 

Σ ΡΞ 0 [m # 2v(n + 1)] 

26 Γ (οι +1) [m= Wn + D] 


where ἃ = mz/[2(n + 1)] and 6, = (2r + 1)z/Qn + 2), and then using the 
identity cos 70, cos k@, = 4.cos (ἡ — k)O0, + 4.cos(j + k)6,. 
24 Determine, to four decimal places, the coefficients in the approximation 
5 


ex > T(x) (x| Ξ 1) 
k=0 

if the approximation is to be exact at the zeros of 7ς(χ), and show that 
the magnitude of the error is smaller than e/23040 = 0.00012 everywhere in 
[—1, 1]. Also, recalling that [1.(Χ}] S 1 on [—1, 1], obtain upper bounds on 
the errors relevant to the least-squares approximations of degrees 2, 3, and 4, 
obtained by truncation, and use Eqs. (7.8.10) to express these approximations in 
explicit polynomial form. 


Section 9.8 


25 Determine two third-degree approximations to e* over [—1, 1], in addition to 
(9.8.17), by truncating the Maclaurin expansion of e* instead with the x* term and 
with the x° term, and proceeding by the Lanczos method. Also compare the 


26 


27 


28 


29 


30 


31 
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error bounds associated with these approximations with each other and with the 
corresponding approximation obtained in Prob. 24. 

Economize the fifth-degree Maclaurin approximation to e* to a third-degree 
approximation over the interval [—¢, ¢]. Also specialize to e = 4 and ¢ = i, 
in each case comparing the approximate maximum error with that corresponding 
to the third-degree Maclaurin approximation. (Set x = es, so that -1 <5 Ξ 1.) 
Obtain an approximation of the form 


cos x © Ay + Anx? + Agxt 


with an error smaller than 5 x 107° over [—1, 1]. 
Show that the polynomials 7,(2x — 1) play the same role over [0, 1] as do the 
polynomials 7,(x) over [—1, 1] and, with the abbreviation 


Τι = T,{x) 


T,(2x — 1) 
obtain the relations 

TA T, = 2x -- 1 T, = 8x? -- 8x + 1 

T; = 32x? — 48x? + 18x -- 1 

T, = 128x* — 256x3 + 160x? — 32x + 1 

T; = 512x° — 1280x* + 1120x3 — 400x? + 50x — 1 
and 


a, 
I 


To xzw=%+T, 8x? = 3% 4+ 47, + T, 
32x3 = 107) + 157, + 67, + Ty 

128x* = 3570 + 567, + 287, + 87. + Τὶ 

512x° = 126T) + 2107, + 1207, + 457; + 107, + 7; 


l 


Use the notation and results of Prob. 28 to obtain a third-degree polynomial 
approximation to 6 ἡ with an error smaller than 0.001 in magnitude over [0, 1]. 
After expressing a five-term (eighth-degree) truncation of the series representation 


in terms of the polynomials 7,(x*) (0 S k Ξ 4), by use of the results of Prob. 28, 
obtain a polynomial approximation to sin x, involving as few terms as possible, 
with an error smaller than 10“ 5 over [0, 1]. 

The modified Bessel function Ko(x) possesses the asymptotic expansion 


ay 2 _3)2 -2.«)\2 
ON eB) εἰς 1. ἢ (1-3) . (4(.53.5} + se. 
π 1!8x 2!:(8χ)2 3! (8χ)3 


in which the error of truncation in the right-hand member is smaller than the 
first neglected term. Show that truncation with the x-> term corresponds to an 
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error smaller than 0.0008 for all x 2 3. Then, after expressing the result of that 
truncation in terms of the functions 7,(3/x) (0 S k S 5), obtain an approximation 
of the same type over [3, 00) involving as few terms as possible, with an error not 
exceeding 0.005 over that interval. 


Section 9.9 


32 


33 
34 


Obtain a linear approximation to e* on [—1, 1] which is near to the minimax 
approximation, in the sense that the magnitudes of the three error extrema differ 
from each other by less than 0.1, by estimating suitable collocation abscissas 
Xo and x,, determining the resultant extrema, and modifying xp and x,. (For the 
purpose of this problem, avoid shortcuts suggested by its simplicity.) 

Proceed as in Prob. 32, but use a method which deals directly with the extrema. 
Determine the minimax approximation involved in Prob. 32 analytically by use of 
calculus. 


Section 9.10 


35 


36 


Derive the equation of the spline approximation to f(x) in the subinterval 
[x,-1> X,], in the form 


_ 3 
s(x) = (χκ — x)? Sy ; Ἔ (x X,-1) Sh 
6h, 6h, 
ἐκ SARE Fant _ Te is ee eek Se _ es) 
h, ἐκ 6 ἐκ h,, 6 
Also show that 
s'(x,-) = TF (shy + 2.0 + BoM 
6 h, 
S(x,+) = — fers (2s, + S44) + ει " ἢ ui: 
6 ἄμ. 1 


and deduce (9.10.6). 
Show that the end conditions (9.10.11) 


s’(a) = s"(b) = 0 
are equivalent to the conditions 


, 1 4 3A = So to 


so = --- 5 
᾿ 2 2 ἢ, 


3 fn — In-1 


ee eet 
5 Snot 


2 hh 


A 
I 
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Also, in the case when ἡ = 3 and the spacing is uniform, show that these end 
conditions lead to the spline derivative formulas 


1 
So = 5h (—19fo + 24f, — 6f2 + fa) 


1 
81 = το Τὸ — 3f, + 120. -- 2.) 


1 
52 τ 54 0 — 12f, + 3f. + 74) 


1 
53 = πη + 6f, -- 24f, + 194) 


37 In the case when n = 3 and the spacing is uniform, show that the end conditions 
(9.10.12) 


s@=f@ s(b=f® 
lead to the spline derivative formulas 
So = fo 


= (Me + fi + 45. - δ) ὑπ 470 Ὁ ΓῸ 


ὁ = So - Af - fe Ὁ 46) ὑτσέ -- AFD 


53. = 7,3 


38 In the case when 7 = 3 and the spacing is uniform, show that the periodicity 
end conditions (9.10.13) lead to the spline derivative formulas 


ὁ - τοὶ - γε τοι - δὼ 
«εἴπ 
h 
%=2(%-W=ih-A 
h h 
$3 = S$ 


Section 9.11 


39 Deal with the difference equation 
κι + ἀπ + 1 = Dy 
by the following steps. 
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(a) With the abbreviation 
sinh (k — r)a 


k 
P, i > (—1)’p, : 
r=1 sinh « 


verify that u, = (—1)**'P, is a particular solution if sinh « = J 3, so that, 
by superimposing the solution when p, = 0, the general solution is obtained 
in the form 


u, = (—1)(A sinh ka + Boosh ka — P,) 


(ὁ) Show that when A and B are expressed in terms of uo and u,, there follows 


(—1)'u, = — - [uo sinh (kK — π)α + (—1)"u, sinh ka 
sinh na 


+ P, sinh ke — P, sinh na] 
(c) Show that 


(Ρ, sinh ka — P, sinh na) sinh a 


k 
= > (—1)’p, sinh ra sinh (n — k)a 


r=1 


+ > (-1)'p, sinh ka sinh (n -- r)a 


r=k+1 


(4) Deduce (9.11.8) and (9.11.9) from (9.11.4). 


40 Whenn = 4, show that the matrix [G,,] is of the form 


and deduce that when the end conditions 
So=fo 54" ἔῃ 
are imposed, the five-point derivative formulas at the nodes are 


Ὁ 
56 


, 
— Sy 


Ee ee © (SHA aren Zi) 
τὴ = 4 Ez nips 5 (ἰδῇ, ΞΡ Hah| 


1 6 
“ἐπ E + 15fk — 2 δι -- ndfs + 1540/3) 


41 When f(x) = x’, show that (9.11.4) becomes 
δι... + 45, + sy_, = 12 


and derive the results of (9.11.17) and (9.11.18). (Use results of Prob. 35 to relate 
values of s’ to values of s”.) 
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42 For a periodic spline, show that 


_ In-1 + (-1)"9, τ 
Ξε εν is 


, 6 . 
si ἘΣ Σ (—1)'G,,u6f, 
In r=1 
and 
’ + (—1 - ἢ ό ~ 
sy, = MH tCven, , 6 Σ᾽ CDG 1th 
In han 
Then use the relations 
G-;, = —-J, = ε- ἢ G,, = 2 Gait. = . Gt 
In In 


to deduce that 


6 r n—r Tt —1)"9, 
sh = sf = a εἰς LT i lS δ 
h 9n+1 τ Gn-1 -- (-1)"2 


43 Use the result of Prob. 42 to obtain the following periodic-spline derivative 
formulas: 


n=3 5 = ΤΟ - f= =O =) 


ee = SU -A =A 29) 


3 
n=5 = hi ~ a + fs — fa) = 8 2 - fis + fi — 34) 


ἡ τὸ ὁπ Oh - ἢ + fa Ms) =  (2- fs Saf Hx) 


[In each case the expression for the spline derivative at successive nodes is obtained 
by cyclic permutation of the subscripts and accordingly is of the same form at 
each node (see Prob. 38). Notice that in each case h = (ὁ — a)/n = P/n, where P 
is the period of the spline. ] 


Section 9.12 


44 Supply the omitted steps in the determination of A 2, Aq, Bo, and B, in (9.12.3). 

45 Obtain the spline approximation to f(x) = x* over the interval [a, δ] = [0, 2A], 
so that m = 2, with the auxiliary conditions So = fC) and s = f’(2h), in the 
form 

h) 


2hx* — h?x? . Osx 
s(x) = 
2h) 


6hx* — 13h?x? + 12h39x — 4h* (hs x 


IIA IIA 


Verify also that, in this case, s(x) is identical with the third-degree Hermite inter- 
polation polynomial in each subinterval. 
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46 Verify the validity of the error formulas (9.12.4) to (9.12.7) in the special case 
considered in Prob. 45. (Notice that here no end effects are present and the 
symbol ~ can be replaced by = in each formula.) 


Section 9.13 


47 Obtain the following spline derivative formulas corresponding to the auxiliary 
conditions sy = 85. = 0: 


n=2 s= x (-Sfo + 6f; — fr) 
= τ} 
ὁ = Οὐ - fi + Sh) 

n=4 sh = =o (—15fo + 340 -- 24. + 6f ~ Sa) 
τ τ σεν (1300 ~ 6h, + Mh -- 665 + fa) 
= = Οὐ - hi + 6s ~ fe 
8 = a (τῇ + 60 — Mh + Of + 130) 


1 


n=8 sh=—- (fo — 6h, + Ah — fs + 0f; 


112h 
— 24f6 + 6ῃ — 15) 


48 Apply the five-point (x = 4) derivative formulas of Prob. 47 to f(x) = sin x on 
[0, x] and compare the results with those given by (9.13.8), together with (9.13.1) 
and (9.13.2). [Notice that here f’(x) = 0 at both ends of the interval. ] 

49 Determine the approximations to each of the integrals 


π πί2 
[, sin ae -2 Ϊ sin x dx = 1 


0 0 


afforded by (9.13.13) and (9.13.15) and also by (9.13.17) and (9.13.19). 


Section 9.14 


50 Show that an inverted difference of the sum u(x) + v(x) generally is not equal 
to the sum of the individual inverted differences, that multiplication of f(x) by a 
constant corresponds to multiplication of the nth inverted difference of f(x) by c 


51 


52 


53 


54 


APPROXIMATIONS OF VARIOUS TYPES 531 


if n is even and by 1/c if n is odd, and that the addition of a constant to f(x) does 
not affect its inverted differences. 

Calculate the successive inverted differences ¢, [1, x], ¢5 [1, 2, x], φ3[1,2, 3, x],... 
for the functions x?, x—?, and x — x74. Then deduce the identity 


5 x—1 


Ἑ χ -- 2 


ῳω [μ᾿ 


χ-- 3 


—12 + 


3 


together with corresponding identities in the other two cases, and verify their 
correctness. 

Form an inverted-difference array and use it to determine a function defined by a 
finite continued fraction which takes on the following values. Also express the 
result as a simple fraction. 


x 0 1 2 3 4 3 


f@) 3 


Replace the given ordinates in Prob. 52 by their three-place rounded values and 
repeat the determination, retaining an appropriate number of digits in the inter- 
mediate calculations and obtaining a function defined as a simple fraction whose 
values at the six given points round to the three-place values used. Compare the 
result with that of Prob. 52. 

Proceed as in Prob. 52 with the following data: 


pm 
ω 
ho 
"ὁ 
[Δ] 
ΜΕΝ 
> 


3 


1 i 15 7 


10. 
| 
| 
| 


x|/O 1 2 3 4 5 
fs | 0 -$ ο ἃ 4 ἡ 
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55 


56 


57 


If f(x) = u,(x)/v,{x) is an irreducible rational function with U(x) of exact 

degree m and v,(x) of exact degree k, show that the (2m ~— 1)th inverted difference 

of f(x) is constant if m > k and that the (2k)th inverted difference is constant if | 
k 2m. [Use (9.15.1). ] | 
Determine a rational function of the form (9.15.1), with n < 4, which takes on | 
the values 


fx) | 1 2 3 3 2 
or prove that no such function exists. 
Show that the substitution sequence 


f(x) = wo(x) Wo(X) = γνχκ(χ)κ) + (ἃ -- X2,)Woe4 1(X) 


X — X2K41 
Won41(X%) = νγγχκα τ(χχκ. 1) ὁ ——2H1 
W~42(%) 
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58 


generates the representation 


f(x) = flo) + δια — x0) + Met 5) 

a, + b3(x — x2) + ες AGS Ra) 
dg + bs(x — χα) ἘΠ: 
where a3, = W2,;(X2,;) and bo,41 = W2,41(%2x41), and that the a’s and b’s can be 
determined as leading elements of columns of first divided differences alternating 
with columns of first inverted differences, in each case relative to corresponding 
elements and leading elements of the preceding column. Investigate the form 
of the nth convergent and illustrate the procedure by use of ordinates of the 
function. 

6 — 6x + 3x? 


9) js a ὁ  - 
70 6 -- 5x + 2x? 


at x = 0, 1, 2, 3, 4, and 5. Determine what representation would result if the 
definitions of w,(x) and w,,.4 ,(x) were interchanged. Determine what substitution 
sequence (and what mixed difference table) would generate the representation 


f(x) = f(X%o) + διὰ — Xo) + 
(x — Xo)(x — χα) 
a, + b3(x — X2) + bax — X2)(% — ΧΆ) + bs(x — χγ)α — X3)(X -- χα) + 5: 


and also what sequence (and table) would generate the representation 


f(x) = f(%o) + bi(x -- Xo) + (x — Xo)(x — χα) 
X — Xo 
a» + 
a, + Eee 5 
ay, + 
For the data 
x 0 1 2 3 4 5 


fo) | 1 1 3 3 ἡ ὦ 


(see Sec. 9.15), form a column of first divided differences followed by columns of 
successive inverted differences (as in Prob. 57) and deduce that 


w 


f(xy) 21+ a ΝΕ 
x — 
ae 
_i4 Xa 
a 39 = 


Also verify that the right-hand member is expressible as (1 + x)/(1 + x7), 


59 For the rounded data 


x 0.1 0.2 0.3 0.4 


16) 9.967 4.933 3.233 2.365 
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verify that the column of first inverted differences is nearly linear. Then determine 

following columns of successive divided differences, as in Prob. 57, and deduce that 
x — 0.1 

— 0.019865 — 0.09835(x -- 0.2) + 0.0036(x — 0.2)(x — 0.3) 


Also determine an approximation to f(0.15). [Here f(x) = cot x. See the 
example in Sec. 9.16. ] 


I(x) 9.967 + 


Section 9.16 


60 Use the recurrence relations (9.16.5) and (9.16.6) to obtain the successive con- 


61 


62 


63 


vergents r, = M,/N, relevant to the illustrative example [f(x) = cot x, x) = 0.1, 
xX, = 0.2, x. = 0.3, x3 = 0.4] as follows: 


—0.29799 + x 1.03656 -- 0.20100x --2.70932 ~ 0.05529x + x? 


9.967  Ξ““ -- ὃ ὃ Θ΄’ σ΄ " 
— 0.019865 0.00199 + x 0.00059 — 2.7199x 


Also verify that the kth convergent agrees appropriately with cot x at the k + 1 
appropriate points, evaluate the successive convergents at x = 0.05, 0.15, 0.25, 
0.35, and 0.45, and compare the predicted values with the rounded true values. 
By eliminating a,,, between (9.16.5) and (9.16.6), show that 
Νι-ι (x Ξ =| 

N, Κ Ν, k-1 


and deduce the relation 


Μωι My _ (- 1} (x -- χρ)α -- χι)---Ὃ -- X%) 
Ν, Ng Ni Na+ 1 


Thus, with the notation r, = M,/N,, show that the kth convergent of the con- 
tinued fraction (9.14.9) can be written in the form 


k ’ — — eee — 
r,(X) = 40 + > (-- Lyre (x Xo)(x χα) (x Xn—-1) 
n= Ny—1(X)N,(x) 
Use the result of Prob. 61 to show that the kth convergent obtained in Prob. 60 
can be obtained also by terminating the expansion 
x — 0.1 (x — 0.1) -- 0.2) 


cot x ~ 9.967 + ------  -ὠ 
~0.019865 (—0.019865)(x + 0.00199) 


(x -- 0.1)(« -- 0.2)(x — 0.3) 
(x + 0.00199)(0.00059 — 2.7199x) 
with the (kK + 1)th term. 
Use results of Prob. 61 to show that the error expression (9.16.8) can be written 
in the form 
γ το Ce cae ype EL ο-ο---ς. 
(x — x4)Ny—1(x) + NX) Px+1 (Xo, X15 ++ Xs X] 
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64 


65 


66 
67 


68 


69 


where 7,(x) = (x — Xo)\(x — x1)---(« — x,). Also show that this relation can 
be rewritten in the form 


f = _ a | eA KDINGOD ὃ ὃΦἝὃἝὅἝὅἝὅΘὃΘῸΘῃῃῃἨ'ὕΨ 
(x) — n(x) = (-1) Nuai(x) + Na(x){ bi +1 [X02 X15 -- +s Χκ x] — 41} 


Assuming knowledge of the fact that cot x becomes infinite at x = 0, verify that 
the introduction of x = 0 as a fifth abscissa in the text example leads to the 
approximate information 

¢3[0.1, 0.2, 0.3, 0] ~ —3.00 


so that ¢3[0.1, 0.2, 0.3, x] varies from about —3.00 when x = 0 to about 
—2.70 when x = 0.4. Under the assumption that that function increases steadily 
over that interval, use the result of Prob. 63 to show that the error in the calcula- 
tion of cot 0.15 from the third convergent would be less than about 0.0006 if no 
roundoff errors were involved. 

Repeat the calculations of the text example, using the following improved (four- 
place) approximate ordinates: 


x 0.1 0.2 0.3 0.4 


9.9666 4.9332 3.2327 2.3652 


Deal as in Probs. 60, 62, and 64 with the results of Prob. 65. 

Determine values of the approximate convergents to K,(0.3) = 3.056, where 
K,(x) is a modified Bessel function, which correspond to the successive intro- 
duction of the following rounded values at x = 0.2, 0.4, 0.6, and 0.8: 


0 0.2 0.4 0.6 0.8 


cot x 


x 


K,(x) oe) 4.776 2.184 1.303 0.862 


Deal as in Probs. 60, 62, and 64 with the results of Prob. 67, obtaining as much 
evidence as possible with regard to the accuracy afforded by the convergent r3(x) 
over the interval [0, 1]. Also verify the conclusions deduced by comparing cal- 
culated values with the following additional rounded true values: 


x 0.1 0.3 0.5 0.7 0.9 1.0 


Ki(x) 9.854 3.056 1.656 1.050 0.7165 0.5098 


Suppose that a rational function R(x) is expressible in the form 


Al, Aol An 


R(x) = 
lx + By Ix + Bz Ix + By 


where none of the A’s are zero. 

(a) Show that A,, B,, and A, can be determined by the following sequence of 
operations: 
Multiply R(x) by x, take limit (= 44) asx — οὐ, divide by A,, take reciprocal, 
subtract 1, multiply by x, take limit (= B,) as x > ©, subtract B,, multiply 
by x, take limit (= A,) asx > ©. 
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(δ) Deduce that we can write 
A, = D,(0) B, = 4(00) 


where 
q(x) = x ΓΞ Ξ 1 
P(x) 
and 
Ρκει(α) = x[Q,(x) -- gx(00)] 
with 


P(x) = xR(X) 
provided that each A, # 0. (Notice that x may be replaced by 1/t, so that the 
limits are taken at t — 0.) 
70 Generalize the result of Prob. 69 as follows, supposing that R(x) = m(X)/0,(x), 
where u,, and v, are polynomials of degree m and k, respectively: 
(a) If m 2 k, show that we can write 
A,| 


R(x) = Οὐχ + Cyx™ τ 4+. 4+ CL + 
Ix + B, 


if each A, # 0. (Use long division.) 
(δ) If k 2 m + 2, show that we can write 


R(x) = Po 


A,| 
Ix + B, 
if each A, # 0. [First take the reciprocal of R(x).] 
(c) If 4; = 0 in any one of the preceding representations, show that we can 
write the portion beginning with A, in the form 


xem 4 Ey ΧΙ et B+ 


Ix + B, Ix + By, = |w(x) 
where 
r r-—1 K, | 
w(x) = x"  Ο,χ +--+ + G+ ahaa 
Bole tony Oe 


for some integer r = 2. [Take the reciprocal of p j(x) in Prob. 69 and proceed 
as in part ().] 
71 Illustrate the methods of Probs. 69 and 70 by obtaining the following repre- 
sentations: 
(a) ἐν τ ee +114 _ 6 " 2| ‘ 4| 
x° + 9x* + 27x +19 |x +1 |x+3 |x 4+5 


4 3 2 
(b) 2 + 8x” + 20x Pe AOK σεν 52h gaa: 2| Ἷ 4| 
x* + 8x + 19 Ix+3 |x+5 
a 2x + 12 _ 2| 2) 
x + 9x? + 22x Ὁ 29 |x2?43x+4 [x +6 
(d) x? + 6x? + 4x + 29 — | 3| 5| 


> SS ὁ -- + he ha ΚΡΟΣ 
x* + 8x3 + 16x? + 40x +76 |xt+2 |x?+4 |x +6 
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Section 9.17 
72 Construct the following reciprocal-difference table from the given ordinates: 
x fps P2 Ps Pa 
0 2 
—2 
1 3. 5 
ον. -2 
7 
2 4 —i 0 
—10 14 
3 1 -τῖς 0 
4 6 — 1 oP 
17 442 28 
ieee 


73 


74 


WN 
fo, 


From this table, rederive (9.14.15) and also obtain the representation 


4 χ-- 2 2+x 
r(x) = -—- + ————— = 5 
5 τς x — 3 1+x 
he were 
52, x-4 
ae 
4 


corresponding to a zigzag difference path launched from x = 2. 
Obtain the formal expansion 

inde 2a ee ire mene 

1 2 3 |i [5 

Also show that ¢2,(x) = 2/n (n 2 1) and φχ,..1(Χ) = (2n + 1)(1 + x), so that 
the (2n)th coefficient is 2/n and the (2n + 1)th is 2m + 1. 
Use (9.16.5) and (9.16.6) to show that the leading convergents of the expansion 
obtained in Prob. 73 are given by 


0 2x 6x + x? 6x + 3x? 
2 4X 6 + 4x 6+ 6x + x? 
30x + 21x? + x? 60x + 60x? + 11x3 


30 + 36x + 9x? 60 + 90x + 36x? + 3x3 


and determine the successive approximations to log 4 and log 2. Also use the 
result of Prob. 61 to show that these convergents are the partial sums of the formal 
expansion 


2 x? 


x 
a eee ge ee 
2+x (24+ x)(6+ 4x) 


x* x 


— --- ὁ’. -- - 
(6 + ἀχ)ί(6 + 6x + χ2) (6 + 6x + x7)(30 + 36x + 9χ2) 


x® 


ee nn - θὀὀἠὀ»Ῥ-Ῥ͵Ἐ--..ς.ς. + 
(30 + 36x + 9x?)(20 + 30x + 12x? + x) 


log(i+x)=0O0+%x- 


75 


76 


77 


78 
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Obtain the formal representation 


χε τ Ξε 


ἄλευρον 
4 [12 IJ [ae |-4 


and express the first five convergents as simple fractions. Also use the result of 
Prob. 61 to show that the representation can be expressed in the form 


π x-1 (@-—1) | (x — 1)° 
4 2 244+x) 414+%x2+x) 
(x — 1)* 


Obtain the formal representations 


Geigy? See Aha he x| 
2c [2ς [26 
and 
ee ee τς 
ce |2c% |2/3e |18c% [2|15ς 


where c is a positive constant. 
With the notation 


1 
A, = τ Mo) 
use (9.17.15) to show that the first five Thiele coefficients in (9.17.6) are given by 


dg = Ao αι ΞΞ --- α) ΞὄΌῦᾧ -- -- 


432 _ (4,453 - 43} 


3 = -------Ξ-Ξ-Ξ-ΦΞΞ--ΞΞΦΞ--ΞΞ:..--..ω..-:- a4 = σῦς Θ᾿ 
Α4,(4,43 -- 43) A,(A,A4 -- 43) 


where a, = φμί(χο). Deduce also that if a function possesses the formal Taylor 
expansion 


Ay + Ayu + Anu? -Ὁ -«' 
and the formal Thiele expansion 


then the leading a’s can be calculated from the leading A’s by use of these relations. 
Show that the expansion (9.17.6) is nonexistent when f(x) = cos x and Xo = 0, 
but that, when F(x) = cos x, G(x) = x”, and x9 = 0, the leading terms of the 
expansion (9.17.23) are of the form 
eee a αὐ μος 
|I-2 |-6 [2 


Also obtain this form from the result of Prob. 77 with u = x2. 
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79 


80 


δ] 


82 


83 


Obtain the leading terms of an expansion of sin x analogous to that obtained in 
Prob. 78 by taking F(x) = (sin x)/x, G(x) = x*, and xp = 0, and also by using 
the result of Prob. 77. 

Determine the Padé entries R, o(x), R3 1(x), Ro 2(x), R1,3(%), and Ro 4(x) for 
f(x) = log (1 + x) at x = 0, by use of (9.17.17) and (9.17.20) or otherwise. 
Also verify that Κα) ,(x) agrees with the corresponding ratio in Prob. 74. 

For each of the approximations obtained in Prob. 80, tabulate the error by tenths 
for —0.5 S x Ξ 0.5. Then compare their (approximate) maximum errors as 
well as their (approximate) RMS values over that interval. 

The equation x — e~* = 0 possesses a real root between x = 0.5 and 0.6. 
Making use of the fact that e~°-© = 0.548812, determine that root to five decimal 
places. 

Determine the root of the equation x* — 3x + 1 = 0 between x = 1.3 and 1.4 
to five decimal places. 


Section 9.18 


84 


δ 


Proceed as in the example of Sec. 9.18 with the approximation 


6x + 3x? 


log (1 + x) % ——————_- 
g ( ) 6 + 6x + x? 


obtaining a more nearly uniform approximation over the interval [-- δ, 8] where 
0 «ς < 1. Also plot the error on that interval when ¢ = 0.25 and when é = 0.5 
and, in each case, compare the approximate maximum error with that associated 
with the original approximation on that interval. 
Proceed as in Prob. 84 with the approximation 
ΠΝ ge a 
|-2 |-6 10 
(see Prob. 78), taking x” as the basic variable and making the error approximately 
proportional to 7,(s) when x? = es (0 Ξ s Ξ 1). Specialize when δ = 0.5 
and when ¢ = I. 


LO 


NUMERICAL SOLUTION OF EQUATIONS 


10.1 Introduction 


This chapter summarizes a number of methods which are available for the 
numerical solution of sets of linear algebraic equations (Secs. 10.2 to 10.9), 
nonlinear algebraic or transcendental equations in general (Secs. 10.10 to 10.13), 
and nonlinear algebraic equations in particular (Secs. 10.14 to 10.19). With 
only minor exceptions, the treatments are independent of the content of pre- 
ceding chapters. 


10.2 Sets of Linear Equations 


A brief summary of terminologies and of some elementary results, relative to 
solutions of sets of linear algebraic equations, is presented in this section. We 
suppose first that we are concerned with a set of n equations relating n unknowns 
X14, X25... , X,, and expressed in the form 


ἄγχι + αιγ)Χ) +°°° + A,X, = C1 


γι χΧι + Az2X_q + °° + Ay,X, = Cy (10 2.1) 


“4 a "ὁ 
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where the n’ coefficients a;; and the n right-hand members c, are prescribed. 
Here a,; represents the coefficient of x; in the ith equation of the set. 

The left-hand members may be specified by the square array of the coef- 
ficients, known as the coefficient matrix, 


Q11 2 *"" An 
(4) qa ese a 

A= | “2: ©22 2n (10.2.2) 
Qn1 Qn2 ann 


ee 6 Ι 
Qi, 442 Qin | C4 
| 
a a eee a | Cc 
M=j] 2 “? Png (10.2.3) 
eeeese#te 8 ὁ δ © 4 # © 9 © © © & @ | 4 @ 
| 
Qn1 Qn2 ann | Ch 


known as the augmented matrix and formed by adjoining the column c of 
right-hand members to the n columns in (10.2.2). 

The minor of any element a,, in the coefficient matrix A is defined as the 
value of the determinant+ of the square array obtained by deleting the ith row 
and the jth column of the coefficient matrix. The cofactor of a,,, to be denoted 
here by A,,, is defined as the result of changing the sign of the minor ἱ + 7 
times. It is then true that, if each element of any column of the coefficient matrix 
is multiplied by its cofactor, then the sum of these n products is the value of the 
determinant of that matrix. Furthermore, if each element of any column is 
multiplied by the cofactor of the corresponding element of any other column, 
then the sum of these n products is zero. Both statements also remain true if the 
word column is replaced by row throughout. 

These facts permit the direct elimination of all unknowns except an 
arbitrarily chosen one, say, x,, from (10.2.1). For if we multiply each equation 
by the cofactor in A of the coefficient of x, in that equation and add the results, 
they lead immediately to the consequence that the result is of the form 


Dx, = C1Ay, + CoAa, ἘΞ + CGyAn (10.2.4 


where D represents the value of the determinant of the coefficient matrix. 
Thus we may deduce that, if D # 0, and if (10.2.1) possesses a solution, then 
that solution is unique, and each x, (kK = 1, 2,..., m) is obtained from (10.2.4) 
by division by D. 


+ The definition and elementary properties of determinants are assumed. 
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It is then easily shown, by direct substitution, that the x’s so obtained 
actually do satisfy (10.2.1). For since (10.2.1) is of the form 


> gee (i = 1,2,...,n) 
k=1 
and since (10.2.4) is expressible in the form 


| ΠΟΤΕ ΜΕΥ ἢ 


Dx, = 2 A jC; (k 
j= 


the result of substituting (10.2.4) into the left-hand member of the ith equation 


of (10.2.1) is 
1 n n 1 n n 
D > ai > A jC; = D = (> ean) Cj 
k=1 j=1 7Ξ1 Mk=1 


Since the inner sum in the second form is the sum of the products of the elements 
in the ith row of the coefficient matrix and the cofactors of corresponding 
elements in the jth row, it is equal to D when j = i and is zero when J # i, 
so that this quantity properly reduces to c,. 

Since the right-hand member of (10.2.4) would reduce to D if Fe a ὃ 
were replaced by 41.» @o,,..., ἄμ» it follows also that the right-hand member 
of (10.2.4) is the value of the determinant of the matrix obtained from the coef- 
Sicient matrix by replacing the column of coefficients of x; by the column of right- 
hand members of (10.2.1). If we denote the value of this determinant by D 
the solution of (10.2.1) can be written in the simple form 


x, == (kK = 1,2,...,n) (40.2.5) 


if D # 0. This result is known as Cramer’s rule. 
It is convenient to write 


(10.2.6) 


J 


a aw ἢ 
D 


when D # 0, and to speak of this ratio as the reduced cofactor of a;, in the 
matrix A. With this notation, the result of writing out (10.2. 4) oe k= 1, 
2,..., ἢ 15 of the form 
x, = Aye, + Agycn +++ + niCn 
χΧχ τ Α͂, 20. as Α͂,,0. te oa n2Cn (10.2.7) 


il 9.9.9... 6995 6 "ὁ Ὁ δ6ὁ0"»9Ὁ 5 9 Ὁ 659 δ 6966 9 ὁ 9» 
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This set of relations is thus the result of “inverting” the relations (10.2.1), 
when D # 0. 
The array of coefficients of the right-hand members in (10.2.7), 


Ay, Ay, ἌΝ Ant 
Au} = A12 422 fonds An2 (10.2.8) 
Ain Aon Ann 


thus may be called the inverse of the coefficient matrix of (10.2.1) in that case, 
in the sense that whereas the array (10.2.2) specifies a transformation of the x’s 
into the c’s, the array (10.2.8) specifies the inverse transformation of the c’s 
back into the x’s. 

We notice that the inverse of (10.2.2) can be obtained by first replacing 
each element of (10.2.2) by its reduced cofactor (cofactor divided by D), and 
then interchanging rows and columns. 

In order to describe the situation in the case D = 0, as well as the case in 
which there are m equations relating n unknowns, where m # 1, it is desirable to 
define the rank of any rectangular matrix with, say, m rows and n columns. 
From any such array, we may form a number of square subarrays, by deleting 
certain rows and/or columns, and compressing the remaining elements into a 
compact arrangement. The largest such subarrays would be of an order equal 
to the smaller of the integers m and n; the smallest would be of order 1 and would 
consist of only a single element. The rank of the given matrix is defined as the 
order of the largest such subarray whose determinant is not zero. 

The basic theorem relevant to the existence of a solution of a set of m 
equations in m unknowns states simply that the system possesses a solution if and 
only if the rank of the coefficient matrix is equal to the rank of the augmented 
matrix. 

Suppose that the ranks are equal, and let the common rank be r. Then r 
is not greater than the smaller of m and n. Now there exists at least one square 
r x r subarray in the coefficient matrix whose determinant is not zero. If one 
such subarray is found, and if r < m, then it can be proved that the m — r 
equations whose coefficients are not involved in that subarray are implied by the 
other m equations and can be suppressed. If r = n, then n equations in n 
unknowns remain and can be solved uniquely by Cramer’s rule. However, if 
r <n, then then — r unknowns in the r remaining equations whose coefficients 
are not involved in the subarray can each be assigned completely arbitrary values, 
after which the remaining r unknowns can be determined in terms of them by 
Cramer’s rule. Thus, the general solution of the system then involves n — r 
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arbitrary constants, and the system is said to be of defect n — r, since it fails to 
determine a unique solution by permitting n — r degrees of freedom. 

The cases of most frequent practical interest are those represented by 
(10.2.1), in which m = n. In particular, if all the right-hand members c, are 
zeros, the set is said to be homogeneous. In this case, the coefficient matrix and 
augmented matrix are automatically of the same rank, so that the set has a 
solution. In fact, one solution is then always the trivial one, for which = 
X2 =''' =x, = 0. If D 0, this is the only possible solution. Usually this 
solution is of no interest, and the important homogeneous sets are those for 
which D = 0, so that the system admits nontrivial solutions. On the other hand, 
if at least one of the right-hand members is not zero, so that the trivial solution 
is not admissible, the interest centers mainly on the cases when a nontrivial 
solution exists and is unique, so that D # 0. Attention will be restricted prin- 
cipally to this last case in the following sections. 

Before proceeding, it may be pointed out that, whereas the preceding 
results are of basic theoretical importance and are essential to an understanding 
of the nature of sets of linear equations, the use of Cramer’s rule in actual 
numerical cases is generally highly inefficient, because of the excessive labor 
involved in the evaluation of m + 1 determinants of order 2, unless 7 is small. 
Many other methods have been devised, certain of which are described in the 
following sections. 


10.3 The Gauss Reduction 


In principle, the simplest practical method of solving the set (10.2.1) is one due 
to Gauss. In this reduction, the first equation is first divided by a,, (assuming 
that αι. # 0) and the result is used to eliminate x, from all succeeding equa- 
tions. Next, the modified second equation is divided by the coefficient of Χ) 
in that equation, and the result is used to eliminate xX, from the succeeding 
equations, and so forth. After this elimination has been effected n times, 
when D # 0, the resultant set, which is equivalent to the original one except for 
the effects of any roundoffs committed, is of the form 


ΧΙ + 2X. + ay3x3 +°°° + GX = Cj 


X2 + 453X3 Ἔ Ὁ a4,x, = Ch 
ΠΗ ΡΣ: (10.3.1) 


where αἱ; and c; designate specific numerical values, and the solution is com- 
pleted by working backward from the last equation, to obtain successively 
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Xn» Xn-1>+++5 Xz. It is convenient to work with the augmented arrays at each 
stage, rather than to write out each equation in full. A renumbering of equations 
and/or variables will be necessary if, at any stage, the coefficient of x, in the Ath 
equation is zero, and it is desirable if that coefficient is small relative to other 
coefficients in that equation, in order that the effects of roundoff errors may be 
minimized. (See Sec. 10.5.) 

The exceptional cases in which D = 0 would evidence themselves through 
the fact that after r such eliminations, where r is the rank of the coefficient 
matrix, all coefficients in the n — r succeeding equations would vanish (except 
for the errors due to roundoff). Unless all right-hand members of those equations 
also were reduced to zeros at that stage, the original set would be unsolvable. 
If all those members were zeros, the final ἡ — r equations would have been 
reduced to the form 0 = 0, and hence would be ignorable. The rth equation 
would express x, as the sum of a specified constant and a certain linear com- 
bination of x,,,,..., X,, and the process of back substitution would finally 
express X,, X2,..., X, in similar forms. 

In illustration, we consider the three equations 


9.3746x, + 3.0416x, — 2.4371x3 = 9.2333 
3.0416x, + 6.1832x, + 1.2163x; = 8.2049 (10.3.2) 
—2.4371x, + 1.2163x, + 8.4429χς = 3.9339 


The reduced equations, corresponding to (10.3.1), are obtained in the form 


x, + 0.32445x, — 0.25997x, = 0.98493 
x, + 0.38624x,; = 1.00246 (10.3.3) 
0.61448 


ΧΆ 
if five decimal places are retained, and the ‘“‘back solution” yields the values 
x, = 0.89643 x, = 0.76512 x; = 0.61448 (10.3.4) 


A discussion of the reliability of these results is deferred to later sections. 
This method is known as the Gauss reduction. A modification, known as 
the Gauss-Jordan reduction, consists of using the kth equation, at the kth stage, 
to eliminate x, from the preceding equations as well as the following ones, so 
that the solution is obtained after n (or less) eliminations and no back sub- 
stitution is necessary. However, simple analysis shows that the Gauss-Jordan 
reduction involves about 4n° multiplications and divisions whereas the Gauss 
reduction involves about 4°. Hence the latter is to be preferred on this basis. 
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In practice, only the coefficients are recorded at the successive stages of 
the reduction, the array corresponding to the first stage intermediate between 
(10.3.2) and (10.3.3) thus being of the form 


1 0.32445 —0.25997 0.98493 
5.19635 2.00702 5.20914 
2.00702 7.80933 6.33427 


The necessity of introducing new arrays at each of the intermediate stages is 
time-consuming and conducive to gross errors, particularly when many equa- 
tions are involved. In the following section, a more efficient technique is 
described. 


10.4 The Crout Reduction 


A modification of the Gauss reduction, which has the advantages that it is 
particularly well adapted to the use of desk calculators and of large-scale 
computers, and that the recording (or storage) of auxiliary data (such as the 
repeated rewriting of modified equations or arrays) is minimized, is due to 
Crout.T 

Starting with the augmented matrix M of the original system 


I 
ἄι 1 αι), 7 Ayn | Ci 
Qo, G22 *** ἄγῃ ! Cg Ι 
M = — = [A ! c| (10.4.1) 
Qn Qn2 Ban ) Cn 


which may be considered as being partitioned into the coefficient array A and 
the c column, one determines next the elements of an auxiliary matrix M’ of 
the same dimensions 


t l 
G1, 42 Qin ; (1 
, a, Ar» he An | C5 ploy 
Μ’ = 22 2) - [Α' [9] (1042) 
ον @ δ @ ὁ @ φΦ Φ δ Φ δ 9 ακ5 ὁ Φ ὃ 9 Φ φ | 
ἢ ; 1 ; 
Qnty Qn? Qnn | Ch 


which may be considered as being partitioned, in the same way, into a square 
array A’ andac’ column. From this matrix, one then obtains a solution column 


+ See Crout [1941], in which modifications which are convenient when the coefficients 
are complex are also given. Similar methods are attributed to Doolittle, Ban- 
achiewicz, Cholesky, and others. 
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x whose elements are the required values of x,,..., X,, 


x= [ "ἢ (10.4.3) 
χ, 


Each entry in (10.4.2) and (10.4.3) is obtained from previously calculated data 
by a continuous sequence of operations, which can be effected without the 
tabulation of intermediate data. 

In order to describe the reduction in a simple way, it is convenient to 
introduce two definitions. First, the diagonal elements (or elements on the 
principal diagonal) of a matrix are those elements whose row and column 
indices are equal, and which are underlined in (10.4.1) and (10.4.2). Second, 
the inner product of a row into a column, each containing n elements, is defined 
as the sum of the n products of corresponding elements, the elements of a row 
being ordered from left to right, and the elements of a column from head to foot. 

The n elements of the first column of the auxiliary matrix (10.4.2) are 
determined first, then the remaining 7 of the m + 1 elements of the first row. 
Next, the remaining 7 — 1 elements of the second column and of the second row 
are determined, then the remaining n — 2 elements of the third column and 
third row, and the process is continued until the array is filled. 

The elements of the first column of M’ are identical with the corresponding 
elements of M; the remaining elements of the first row of M’ (to the right of the 
diagonal element a,) are each obtained by dividing the corresponding element 
of M by the diagonal element a,,. Thus, for example, aj; = @,;, a2; = @,; 
and a2 = 4,,/a,. From this stage onward, the elements of M’ are calculated, 
in the order specified above, according to two rules: 


1 Each element on or below the principal diagonal in M’ is obtained 
by subtracting from the corresponding element in M the inner product of 
its own column and its own row in the square subarray A’, with all 
uncalculated elements imagined to be zeros. 

2 Each element to the right of the principal diagonal in M’ is calculated 
by the same procedure, followed by a division by the diagonal element 
in its row of M’. 


Finally, the elements of the solution column x are determined in the 
order X,. Xn—1>-+ +» X2, X1, from foot to head. The element x, is identical with 
ο΄. Each succeeding element above it is obtained as the result of subtracting 
from the corresponding element of the ς΄ column the inner product of its row in 
A’ and the x column, with all uncalculated elements of the x column imagined 
to be zeros. 
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The preceding instructions are summarized by the equations 


Im 1 


ai; = ayy τ" az Ain Ay; (i = j) (10.4.4) 


; 1 ee 
ai; = Σ E - 2 | (i < ἡ (10.4.5) 


i-1 
c= = E —- > cc | (10.4.6) 
a k=1 
and 


X= Co - > ayx, (10.4.7) 
1 


where i and 7 range from 1 to n when not otherwise restricted. The derivation 
of these relations is included in Appendix A. It is seen that the process defined 
by (10.4.7) is identical with the back solution of the Gauss reduction, which 
determines x,, X,-1,..., χα from (10.3.1). 

In the important case when the coefficient array A is symmetric, so that 
each element a;; in A above the principal diagonal is identical with the sym- 
metrically placed element a,; below the diagonal (a, j = 4;;), as in the system 
(10.3.2), it can be shown that each element aj; in A’ above the principal diagonal 
is given by the result of dividing the symmetrically placed element aj; below 
the diagonal by the diagonal element αἱ. This fact leads to a considerable 
reduction in labor in such cases, particularly when 7 is large, since then each 
element below the diagonal thus can be recorded as the dividend involved in 
the calculation of the symmetrically placed element before the required division 
by the diagonal element is effected. 

It can be shown that the elements to the right of the diagonal in M’ are 
identical with the elements which appear in corresponding positions in the 
augmented matrix of (10.3.1), obtained by the Gauss reduction. The compact- 
ness of the tabulation is a consequence of the fact that all necessary intermediate 
data are tabulated in the remaining spaces, which would normally be occupied 
by ones and zeros. 

Indeed, it is easily verified that the elements of the matrix A’ can be 
associated with a left triangular matrix L and the right triangular matrix R 
just described 


ait 0 oes 0 1 ai eee Qin 


La] αι 2 τ: 0 R=(|9 1 τ 41 (10.4.8) 
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such that 
A=LR _ (10.4.9) 


with the conventional definition of matrix multiplication.t 

The kth diagonal element a;, is the number by which the kth equation 
would be divided in the Gauss reduction before that equation is used to 
eliminate x, from succeeding equations. In consequence of this fact, it is true 
that the value of the determinant of the original coefficient matrix A is the product 
of the diagonal elements of A’. In this connection, it is noted that the Crout 
reduction comprises an efficient method of specifically evaluating a deter- 
minant. When there is no associated set of equations to be solved, the ς, ς΄, 
and x columns naturally are omitted. 

A continuous check against gross errors is afforded by adjoining to the 
columns of M an additional column, each of whose elements is the sum of the 
elements in the corresponding row of M. If this column is treated in the same 
way as the ec column, corresponding check columns are obtained and adjoined 
to M’ and x. The check consists of the fact that each element in the M’ check 
column should exceed by unity the sum of the elements in its row of M’ which 
lie to the right of the diagonal element, whereas each element in the x check 
column should exceed by unity the corresponding element in the x column itself. 
(See Prob. 4.) The sudden appearance of an appreciable discrepancy will 
generally indicate the commission of a gross calculational error. Small dis- 
crepancies generally correspond to the effects of intermediate roundofis, 
effected in the steps of the reduction, which can be removed by retaining ad- 
ditional significant figures in the calculation or by an alternative procedure to be 
described in the following section. 

For the set of Eqs. (10.3.2), the complete tabulation consists of the array 
of the elements of the given matrix (and its check column, if desired) 


9.3746 3.0416 —2.4371 , 9.2333 19.2124 
3.0416 6.1832 1.2163 | 8.2049 18.6460; (10.4.10) 
— 2.4371 1.2163 8.4429 | 3.9339 11.1560 


the array of the elements of the auxiliary matrix (and its check column) 


0.98493 2.04941 
.1.00246 2.38870: (10.4.11) 
0.61448 1.61448 


9.3746 0.32445 —0.25997 
3.0416 5.19635 0.38624 
— 2.4371 2.00702 7.03414 


| 
| 
| 
| 
| 
| 


+ The element in row i and column j of the product is the inner product of the ith 
row of the first factor into the jth column of the second. 
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and the solution column (and its check column) 


0.89643 1.89643 
0.76512 1.76512} (10.4.12) 
0.61448 1.61448 


if five decimal places are retained in the calculations. 


10.5 Intermediate Roundoff Errors 


In the preceding example the gross-error check columns display no discrepancies 
through the fifth place. This fact, however, does not guarantee that the results 
are correct to five decimal places. Indeed, the relationship between the sizes of 
such discrepancies and the effects of intermediate roundoffs or gross errors is 
not a simple one. 

If the calculated solution values are substituted into the left-hand members 
of the original equations, then the presence of deviations between the resultant 
members and the original right-hand members serves to indicate the presence 
of errors due to intermediate roundoffs, or of gross errors; but again the 
magnitudes of the deviations are not dependable measures of the magnitudes of 
the solution errors. 

The following oft-cited example, which illustrates this fact, was constructed 
by T. S. Wilson. It can be verified by direct calculation that the equations 


10x, + 7x, + 8x3 + 7x4 = 32 
7X, + 5x, + 6x3 + 5x, = 23 

. . ; : (10.5.1) 
8x1 + 6x, + 10x; + 9X4 ΞΞ 33 


7X1 + 3X5 + 9x3 + 10x, = 31 
are such that the approximate values 
XxX, = -—7.2 x, ἢ 14.6 x3 2 —2.5 xX, 2 3.1 (10.5.2) 


reduce the respective left-hand members to 31.9, 23.1, 32.9, and 31.1, whereas 
the approximate values 


x, = 0.18 x, = 2.36 x3 ~ 0.65 xX, 1.21 (10.5.3) 


reduce these members to the more nearly correct values 31.99, 23.01, 32.99, 
and 31.01; but the true solution is 


Xy =X, = X3 =X, = 1 (10.5.4) 
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Thus we see that a poor approximation to the true solution may “almost’’ 
satisfy the relevant equations. 

The precise analytical treatment of roundoff effects is complicated, 
particularly when many equations are involved. Error bounds are difficult to 
determine and, when found, generally are extremely conservative. Statistical 
results are more realistic but are inherently incapable of affording error bounds. 
In practice, the following simple method of discovering and effectively removing 
roundoff-error effects usually suffices. 

If substitution of the calculated values x,, X,,..., X, into the left-hand 
members of (10.2.1) yields ¢,, ¢,,..., €,, so that 


Ay4X4 + A12X2 Ἔ tee t+ QinXn = Cy 


Qn1X4 ἘΝ An2X2 aS ἡ AnnXn ΞΞ Ch 


whereas the true values are to satisfy the equations 


α11ΧῚ1 + Qi2X2 tote QAinXn ΞΞ C1 
Qni Χ1 Τ' Qn2X2 7 es AnnXn = Cn 
there follows, by subtraction, 
αι. OX, + Ay2 OX. + °° + Ay, Ox, = OCy 
a πλος εν στα, iene see πιο arate tee oes enone (10.5.5) 
Qny OX, + Anz OX, + + Qin OX, = OC, 


where 
OX, = Xz κα Xp OC, = Ck —_ Cy (10.5.6) 


Thus the necessary corrections 6x,,..., ὄχ, satisfy a set of equations which 
differs from the original set only in that each c, is replaced by the residual 
Cy — Cy 

If this set could be solved without roundoff, the corrections thus would 
be obtained exactly, provided that the dc, were themselves computed without 
roundoff. But since this situation generally will not exist, the corrections ob- 
tained are themselves approximate. New residuals can then be calculated, 
and the process can be iterated, if so desired. Each such calculation is particularly 
simple in the Crout procedure, since only the ς column of M, the ς΄ column of 
M’, and the solution column need be recalculated, all other data being 
unchanged. 

Whereas there is no certainty that the iteration will converge rapidly 
(or at all), and whereas relevant criteria are difficult to apply in practical cases, 
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both the presence and the rapidity of convergence generally are confirmed by 
actual calculation. Thus the method usually is an efficient one and, by suc- 
cessive iteration, without increasing the number of significant figures in the 
elements of A’ but with the residuals 6c; and the corrections 5x, expressed in 
units of the leading decimal place in the dc’s, it is usually possible to stabilize 
the desired number of significant figures in the approximate solution without an 
excessive amount of computation. Clearly, the same increase in accuracy could 
be obtained alternatively, but generally less conveniently, by repeating the 
original calculation with overall retention of additional significant figures. 

In the case of the preceding example, the residuals corresponding to the 
approximate solution (10.4.12) are found to be 


Se, = —1.2462 x 1075 δο; = 3.6504 x 1075 
5c, = —1.9095 x 1075 


and the approximate corrections are found to be 


ὄχ, = —0.59421 x 1075 ὄχ, = 0.98893 x 1075 
5x3 = —0.54016 x 1075 


if five significant figures are retained, yielding the improved values 
x, = 0.8964240579 χα = 0.7651298893 x3 = 0.6144745984 


The new residuals are found (by substitution of the calculated values of OX, 
6x2, and 6x3 into the equations which yielded them) to be of the order of 107 το 
and another iteration would supply 14-place accuracy, the rounded 10-place 
values agreeing with those given above except for a one-unit change in the tenth 
digit of x,. 

If the coefficients and right-hand members of the original set of equations 
are only four-decimal-place approximations to true values, the preceding 
retention of 10 or more decimal places may_be expected to be foolish, since it is 
useless to strive for a higher degree of accuracy than that which is compatible 
with errors inherent in the given system. This problem is to be considered 
explicitly in Sec. 10.7. 

Large relative errors and/or failure of this iterative correction process 
occur most often when a diagonal element aj, which is small relative to the 
elements to its right appears in A’, in a position preceding the last one, since the 
necessary subsequent divisions by a,, then will propagate its error unfavorably 
into subsequent calculations. Such situations often can be avoided by use of 
the following procedures. 
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First, before the reduction is begun, the equal members of certain equations 
may be multiplied by constants as necessary in such a way that the largest 
coefficients in the several equations become of comparable magnitude. This 
process, as applied to the associated augmented matrix, is sometimes called 
equilibration by rows. Its principal purpose, in the present case, is to make the 
succeeding process effective. 

Second, also in advance, the equations and/or unknowns may be re- 
ordered as necessary (with a relabeling of unknowns if they are, in fact, reordered) 
in such a way that each diagonal element in the coefficient matrix is dominant 
in its row, insofar as this is possible. 

Finally, use may be made of a so-called pivoting (or “‘partial pivoting’’) 
process in which, at the kth stage of the Gauss reduction, instead of always 
using the Ath transformed equation to eliminate x, from all following equations, 
it is determined whether any of the n — k following transformed equations have 
an x, coefficient which is larger in magnitude than that in the kth equation. If 
so, then the equation with the largest x, coefficient is interchanged with the 
current Ath equation and is used instead for the elimination of x, from the 
transformed equations which then follow it. 

Correspondingly, in the Crout reduction, immediately after all elements 
in the kth column of M’ have been determined, one may compare the diagonal 
element a;; in that column with all elements below it. If one or more of these 
elements exceeds aj; in magnitude, then one may interchange the row containing 
the largest such coefficient with the kth row of M’, effect the same interchange 
on the two corresponding rows of M, and then proceed as before. (For hand 
calculation on a desk calculator, the reordering of rows is somewhat time- 
consuming. Although it is not difficult to devise a procedure which accom- 
plishes the same result without an actual physical interchange of rows, it is more 
easily discovered than described and hence is not detailed here.) . 

Clearly, generally there is little virtue in effecting a pivot interchange 
unless the magnitude of the incumbent is substantially smaller than that of a 
competitor. In addition, in those frequently occurring cases where all diagonal 
elements of A are rather strongly dominant in both their rows and their columns, 
as in (10.4.12), pivoting rarely is needed. On the other hand, there are un- 
pleasant situations (considered in Sec. 10.7) which cannot be remedied by pivot- 
ing or any other available device. 

As a simple illustration of the procedure, we deal with the system 


4.3x, + 1.6x, — 2.3χς = 3.6 
3.8x, + 1.4x, + 15x; = 6.7 (10.5.7) 
1.8x, + 1.2x, + 3.1x3 = 6.1 


I 
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supposing for present purposes that the coefficients and right-hand members 
are exact, and retaining only three decimal places throughout the reduction 
process in order to emphasize the effects of intermediate roundoffs. 

Here no significant preliminary modifications seem to be indicated. If 
the Crout reduction is used without pivoting, the auxiliary matrix is obtained 
in the form 


4.3 0.371 0.535 | 0.837 

= 3 I 

3.8 0.014 —252.357 | —251.386 (10.5.8) 
18 0.530 137.812 | 1.000 


and there results 
x, ἢ 1.011 x, = 0.97] x3 ~ 1.000 (10.5.9) 


Substitution of these approximations into the left-hand members of (10.5.7) 
yields the three-place values 3.601, 6.701, and 6.085 and hence also the residuals 
—0.001, —0.001, and 0.015. Here the use of (10.5.5) produces the corrections 
ὄχ, = —0.026, dx, = 0.071, and 6x, + 0.000; and, in correspondence with 
the “improved values” x, ~ 0.985, x, ~ 1.041, and x, ~ 1.000, the residuals 
are found to round to —0.001, 0.000, and 0.022 so that the iterative correction 
process appears to be ineffective. 

Clearly, the small diagonal element aj. = —0.014 is the probable source 
of the trouble. Indeed, the pivoting process would have dictated an interchange 
of rows 2 and 3 after the completion of the second column of M’, in which case 
the auxiliary matrix would be obtained as 


43 0.372 —0.535 : 0.837 
1.8 0.530 7666 8.667 (10.5.10) 
3.8 -0.014 3.640 1000 


and the resultant three-place approximate solution 
x, ~ 1.000 x, = 1.001 x3 ~ 1.000 (10.5.11) 


possesses only a small relative error which can be corrected by the iterative 
process. 


10.6 Determination of the Inverse Matrix 


From (10.2.7) and (10.2.8), it follows that the kth column of the matrix (10.2.8), 
which is the inverse of the coefficient matrix (10.2.2), is the solution column 
corresponding to the result of setting c, = 1 and all other c’s equal to zero in 
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(10.2.1). Thus if, in place of the single ς column in (10.4.1), we insert the square 
array 


1 
ae (10.6.1) 
0 0 Ι 


of n columns, and treat each column of this array as ἃ c column, we will obtain 
finally the array (10.2.8) in place of the single x column. That is, the resultant 
solution array will supply the inverse of the coefficient matrix of the given 
set of equations. A check column can be included, if so desired, and the rules 
given for its use apply as stated. 

Analysis shows that n® long operations (multiplications and divisions) are 
needed, in general, to determine the inverse of the m x n matrix A. The com- 
putation of a solution column x in correspondence to a c column by use of Eqs. 
(10.2.7) then requires n? additional long operations (assuming no zero elements). 
On the other hand, once the Crout auxiliary matrix A’ has been determined by 
(n> — n)/3 long operations, the derivation of the x column from ἃ ¢ column by 
its use also requires only 52 additional long operations. Thus, from this point of 
view, the determination of A~‘ for the purpose of computing the x column for 
a number of different ¢ columns is not an efficient procedure. 

However, the determination of the inverse matrix is necessary when 
X1,..., X, are to be expressed explicitly as combinations of c,,..., c, by means 
of (10.2.7), for the purpose of studying the analytical dependence of the x’s 
on the c’s. It is also desirable in certain statistical computations where the 
elements of the inverse have significance in the underlying theory (see Sec. 7.4). 

In the case of the example (10.3.2), the auxiliary array corresponding to 
the given array 


is found to be 


0.106671 0 0 
— 0.062438 0.192443 0 
0.054773 —0.054909 0.142164 
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and the solution array is obtained in the form 


0.148032 —0.083594 0.054774 
— 0.083594 0.213651  —0.054909 (10.6.2) 
0.054773 —0.054909 0.142164 
Here six decimal places were retained, in order that five significant figures might 
be afforded after a rounding.+ It may be noticed that the inverse matrix pos- 
sesses the same symmetry as the given matrix. (The single discrepancy of one 
unit is due to roundoff.) 
The result obtained is equivalent to the statement that, apart from the 
effects of roundoffs, the solution of the set (10.3.2) would be of the form 
xX, = 0.148032c, — 0.083594c, + 0.054774c, 
—0.083594c, + 0.213651c, — 0.054909c, (10.6.3) 
0.054773c, — 0.054909c, + 0.142164c, 


X42 


ΧΆ 


if the right-hand members of (10.3.2) were replaced by C1, Co, and 63, respectively. 
In particular, the substitution of the actual right-hand members into (10.6.3) 
again leads to (10.4.12). The elements of (10.6.2) are the reduced cofactors 
defined in (10.2.6), in accordance with (10.2.7). Since, as stated in Sec. 10.4, 
the determinant D of the given matrix is the product of the diagonal elements of 
(10.4.11) 


D = (9.3746)(5.19635)(7.03414) = 342.66 


the array of the cofactors themselves is obtained by multiplication by D (and 
interchange of rows and columns in the more general case). 


10.7. Inherent Errors 


In addition to the errors due to intermediate roundofis, which usually can be 
detected and removed by the methods previously described, errors due to possible 
inaccuracies in the coefficients and right-hand members of the given equations 
themselves must be taken into account. For example, Wilson’s equation set 
(10.5.1) shows that relatively small changes in the right-hand members of an 
apparently innocuous set of equations (with a symmetric and, in fact, positive 


} For purposes of illustrating the technique, we again ignore the fact that inherent 
errors due to roundoff in the given data may adversely affect the significance of 
certain of the digits. 
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definite coefficient matrix) can bring about very severe changes in the components 
of the solution. (See also Prob. 23.) 

In order to investigate these errors, we assume now that the errors due to 
intermediate roundoffs are negligible. We then suppose that the set actually 
solved is 


reat rT e (10.7.1) 


whereas the true values of the coefficients are a;; + 6a;; and the true values 
of the right-hand members are c; + 6c;. If we denote the true value of the ith 
unknown by x; + 6x;, there follows also 


(αι. + 6Q,1)(X, + ὄχι) Hott + gy + δαιι)α, + OX_) = Cy + OC, 


(ny - δα, γα, - ὄχι) - τ. + Gin + Onn) (Xp + OX,) = Cy + OC, 


and the result of subtracting Eqs. (10.7.1) from (10.7.2) is expressible in, the 
form 


ny OX, Ht Ἔ Ayn ὄχ, = δα, -- (χι δα, Ἔ 7) + OX OQnn) 


if products of errors, of the form 6a;,5x,, are assumed to be relatively negligible. 
Thus, if the inherent errors δα;; and dc; were known, the corresponding 
solution errors 5x; would be obtained (to a degree of accuracy consistent with 
this assumption) by solving a set of equations which differs from the set actually 
solved only in that the right-hand member c; is to be replaced by ἡ,» where 


ἡ; = OC; — (xX, δα τ +X, δα) (10.7.4) 


However, in practice, it is usually known only that the errors 6a,; and dc; 
do not exceed a certain positive number, say, 8, in magnitude, so that 


—e S 6a;; 3 8 —e<o6c;<Se (10.7.5) 
Hence, in such cases, the value of ἡ; is not known and we are certain only that 


Inj $ Ε (10.7.6) 
where 
E = dd + |x| + |x| + ΔῸΣ +- |x,,|)é (10.7.7) 


The solution of the set (10.7.3) can be expressed in the form 


6x, = Aun, + Ax, Ἔ 1 + Ann (A = 1,2,...,”) (10.7.8) 
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with the notation of (10.2.6). Thus, if (10.7.6) is true, there follows 
ὄχι! S (Asal + [Axl +°°° + [ADE (10.7.9) 


The reduced cofactors involved in this expression are the elements of the kth 
row of the inverse of the coefficient matrix. Thus, if the elements of the inverse 
matrix are calculated, approximate upper bounds on the effects of inherent 
errors are obtainable from (10.7.9). They are not strictly upper bounds, since 
they were derived under the assumption that terms of the form 6a;,6x, are 
small in magnitude relative to E, as defined in (10.7.7). However, unless the 
upper bounds predicted under this assumption are such that the truth of that 
assumption is contradicted, they may be accepted as close approximations to 
the true upper bounds (which could be attained, in any case, only when ail 
the errors combined in the most unfavorable way). 

In the case of (10.3.2), if it is supposed that the coefficients and right-hand 
members are merely rounded approximations to true values, there follows 
E = 3.28 = 1.64 x 107, and reference to (10.6.2) yields the estimates 


\5x,|.. 55 0.29E = 0.48 x 1074 ἘἼἊ[ὄχι!,͵ κα 0.35Ε = 0.57 x 1074 


max max 


\5x3|_ = 0.25E = 0.41 x 1074 


max 


Thus, we could be confident only that the solution of the true equations is 
such that 


0.89637 < x, < 0.89648 0.76507 < x, < 0.76519 αὐτὰ 
0.61443 < x, < 0.61452 " 


so that we could write x, = 0.8964, x, = 0.7651, and x; = 0.6145, with 
the last digit in doubt by one unit in each case. 

Unless the given system of equations is to be solved for literal values of 
the right-hand members, so that the determination of the inverse matrix is 
advisable in any case, an error estimate which does not involve the calculation 
of the elements of that matrix is desirable. It is clear that the upper bound 
(10.7.9) would correspond to the 6x, which is the kth element of the solution 
column satisfying a set of equations of the form 


ages Binder hen ema ates ae (10.7.11) 


for some choice of the ambiguous signs of each of the 7 right-hand members. 
However, the manner in which the signs are to be associated with successive 
equations cannot be determined unless the signs of the relevant cofactors of the 


558 INTRODUCTION TO NUMERICAL ANALYSIS 


coefficients of 6x, in these equations are known. Furthermore, a different 
combination of signs may be needed to maximize each of the 6x’s. 

Reference to the Crout reduction shows that each of the elements in the 
solution column is obtained as a linear combination of the elements of the ce’ 
column of the auxiliary matrix, each of which is, in turn, a linear combination 
of the elements of the c column (that is, of the original right-hand members). 
Thus, if each entry in the c column were + £, and if, in the calculation of each 
element of the ς΄ and x columns, we were to replace all subtractions by additions, 
it follows that no element of the resultant x column could be exceeded in 
magnitude by a corresponding element obtained by solving (10.7.11) with any 
prescribed combination of signs. 

Hence, if an additional column with unity as each element is adjoined to 
the matrix M and is transformed just as the ς column except for the fact that 
all subtractions are replaced by additions, the result of multiplying by E the 
elements of the final corresponding column, adjoined to the solution column, 
gives (approximate) upper bounds on the possible errors in the corresponding 
elements of the solution column, due to possible errors in the coefficients and 
right-hand members of the given equations. (This procedure appears to be due 
to Milne [1949].) The bounds obtained in this way usually exceed the more 
precise bounds afforded by (10.7.9), but are obtained much more simply. 

In the case of the illustrative example, the inherent-error check columns 
adjoined to the given, auxiliary, and final arrays are found to be 


| 0.10667 0.28640 
] 0.25488 0.35215 
] 0.25185 0.25185 


Here it happens that the approximate upper bounds, obtained by multiplying 
the successive elements of the last column by E, are identical with those afforded 
by the preceding analysis. 

The fact that this situation is not a general one may be illustrated, for 
example, by the case when only two equations are involved. Here, the estimates 
afforded by (10.7.9) are 


lao] + 162] E δ; 3 [4 .1| + 4214] Ε 
[D] |D| 
whereas it is readily verified that the estimates afforded by the simpler procedure 


are 


15x 1 |max 


5 τ 1441422 — 442421] + 1442421] + [411410 E 
| XG ae are fe ee ee ee Fa oe I ετο νος αν 
Ια 1 κιἱ IDI 


a a 
15X | max ~ | 11 τ, | 21\ E 
|D| 
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Although the estimates for |dx,|,,,, are identical, it is seen that the latter estimate 
for |6x,|nax exceeds the former estimate except in special cases, such as that in 
which @,,4,. > @,2a,, > 0. 

Systems in which small relative errors in the coefficients or right-hand 
members, or in the process of solution, may correspond to large relative errors 
in the solution are often said to be ill-conditioned systems and are essentially 
characterized by the fact that the determinant of the coefficient matrix is small in 
magnitude relative to certain of the cofactors of elements of that matrix, when 
the matrix has been normalized such that its largest element is of the order of 
magnitude of 1. When such a system is encountered, one must either make the 
inherent errors small by retaining a large number of significant figures in the 
given data (when this is possible) or control the effects of inherent errors as 
closely as possible and then accept the fact that the inaccuracies in the given 
data still may permit the solution to be determined only within relatively wide 
error limits. Wilson’s example (10.5.1), with the coefficients and/or right-hand 
members modified “‘more realistically’’ by small perturbations, serves to illustrate 
this fact rather dramatically. 


10.8 Tridiagonal Sets of Equations 


Systems of equations of the special form 


ἄχ, + fix, = οἱ 
C2X_ + dX. + fx = C2 
C3X> + d3Xx, + fizX4 = C3 


(10.8.1) 


SO Se CE EE AE OO OO 88 6 8 6 08 6 Oe Cee eee 6 6 8 60 8 6 6 ee ee 6 64 ee 6 ee ee 6 ee 


Cn—1X_—2 τ χα ΙΧ = Cn 


CnXn-1 + d,Xpy = Ch 


are of frequent occurrence in practice and are often said to be tridiagonal, since 
only the diagonal elements d; and the adjacent elements e,; and ἢ may differ 
from zero in the coefficient matrix A associated with any such system. It can be 
seen that the Crout reduction preserves this property, in the sense that the sub- 
matrix A’ in the auxiliary matrix M’ of (10.4.2) then also is of tridiagonal form. 
Furthermore, by virtue of the fact that the e’s have no nonzero left neighbors 
and the /’s no nonzero upper neighbors, it is easily verified that here the relation- 


+ More specific measures of the “condition” of a matrix have been proposed by von 
Neumann and Goldstine (see von Neumann and Goldstine [1947] and Goldstine and 
von Neumann [1951]), Turing [1948], and others. Whereas these measures are of 
marked theoretical importance, their usefulness in explicit numerical situations is 
limited by the amount of computation involved in their evaluation. 


560 INTRODUCTION TO NUMERICAL ANALYSIS 


ships between the elements of M’ and the corresponding elements of M take 
the simplified forms 


ΘΟ; = @; (i = 2,3,...,n) (10.8.2) 
di = d, d; = d; aay @:fi-1 (i = 2. 5; ary n) (10.8.3) 


fis! “Geo ucnsdy G84 


ἐ; 
and 
ἢ Cy ἢ (Ξ 6161 . 
ι τ ἢ; rT ( ae ) 


Finally, the relations of (10.4.7) reduce to the forms 
x, = C, xX, = αἱ — SiXiad (Gi=n-—I,n —2,..., 1) (10.8.6) 


For the purpose of compactness, it is convenient to record the coefficients 
and right-hand members of (10.8.1) in the array 


a fi; i gg 
€2 d, fo | © 
P58 Scasse atone bewecs ΞἜΨ{ (10.8.7) 
έ,.-- 1 di—1 7,3 | Cn-1 
6, d,, εἰ Aes 


so that the diagonal elements of A are in the second column, the subdiagonal 
elements in the first column, and the superdiagonal elements in the third column. 
If a corresponding array P’, with primed elements, is defined in correspondence 
with the Crout auxiliary matrix, its elements may be determined by use of 
(10.8.2) to (10.8.5), after which the elements of the solution x are obtained by use 
of (10.8.6). 

For this purpose, we notice first that the first column of P’ is identical with 
the first column of P. To complete the first row of P’, we next evaluate the 
quantities 

jad fiat gat 


Then, to complete the second row of P’, we determine successively 


fof 

ἢ 1h , _ +2 , _ €2 — CnC, 

dj τ 4χ -- eof; La Sa ae 
2 2 


The third row is completed by use of the same formulas with all subscripts 
advanced by unity and the remaining rows of P’ are completed, in order, in the 
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same way (except that 7,7, is not needed in the last row). The elements of x then 
are determined successively, from the last two columns of P’, in the reverse order 


U / es ¢ @ —— μ — , 
Xn = Cy Mego Ci J ν᾿ Xy = Cy FiX2 


It may be noticed that the complete process requires a maximum of 3n — 3 
multiplications, 2n — 1 divisions, and 3n — 3 additions, when the system 
comprises m equations, and it compares favorably with other procedures in 
this respect. 

As a simple illustration, the system 


2X, — X2 = 6 
—x, + 3x, — 2x; = | 
--2χ, + 4x3 — 3x, = --Ξ2 
—3x,3+5xy,= 1 
corresponds to the array 
2-1: 6 
-Ἰ 3 -2: 1 
oe ee τὰ -- 
—3 5 ae 


and it may be verified that the auxiliary array P’ and the solution column x are 
obtained in the forms 


π΄ 5 
eT {20% ἃ] *7 13 
ay ἢ 2 2 


The check columns can be used exactly as before, if so desired, when 
account is taken of the fact that the second column of P (or of P’) comprises 
the elements in the principal diagonal of M (or of M’). It follows also that the 
determinant of the coefficient matrix associated with (10.8.1) is the product of 
the elements in the second column of P’. In the preceding example, this deter- 
minant thus has the value 15. 


10.9 Iterative Methods and Relaxation 


In many sets of linear equations which arise in practice, the equations can be 
ordered in such a way that the coefficient of x, in the kth equation is large in 
magnitude relative to all other coefficients in that equation. Such sets are often 
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amenable to an iterative process in which the set is first rewritten in the form 


1 
Χι = — (Cy το Ay2X2 — A43X3 το — AynXn) 

411 

1 
Χῃ = —— (0 — ἄγιχι — ἀγχ5Χ3 το τ᾽ — AznXy) (10.9.1) 

a22 

1 
x, = — tc; — AnyXy π AyaX2 7 008 πτ ee eee 


nn 


The initial approximations may be taken to be 


x6 = St χρῶ). 2 ww, = (10.9.2) 
ἄτι 422 ἄγη 

The next approximations then are obtained by replacing the unknowns in the 
right-hand members of (10.9.1) by these initial approximations, and the feed- 
back process is repeated in the hope that the input and output of a cycle even- 
tually will agree within the specified error tolerance. 

Thus, in this process, which is often called Jacobi iteration, in each step the 
current values x,,..., X, are all replaced by modified values Yeu ἐφ ee. 10 


accordance with the equations 


ΧΙ = (cy — @42X%2 τ τ. A1nXn) 
Qi4 
1 

χὰ = — (€2 -- ἄγιχι -- 9 Χ9) 0.9 
ie (10.9.3) 
1 

ΠΗ +. τ (Cc, QniX1 — ὩΣ hy pi heat) 
Qnn 


In the case of the system (10.3.2), about ten such iterations are required for three- 
place accuracy. 

If the iteration is modified in such a way that each unknown in each 
right-hand member is replaced by its most recently calculated approximation 
rather than by the approximation afforded by the preceding cycle, the rate of 
convergence depends upon the order in which the x’s are modified. In particular, 
if they are modified cyclically in their natural order, the modified values are 
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related to the current values by the equations 


1 
xt = — (C, — α,2Χ) -- - Aik) 
Qi 
Xz = Εν (c, — ἄγιχΧι -- A1,Xn) (10.9.4) 
22 δὰ 
* 1 * 
x, = a. (c,, QniX1 — Ann—-1Xn-1) 


In the case of (10.3.2), the number of cycles required to afford three-place 
accuracy is reduced to about six. The procedure described by (10.9.4) is often 
called Gauss-Seidel iteration, although its attribution to either Gauss or Seidel 
appears to be improper. | 

In the frequently occurring cases when the coefficient matrix is real and 
symmetric, so that a;; = a,;, and the diagonal elements are all positive, this 
iteration will converge if and only if all the n quantities 


ὴ Ρ 411 412 413 ἄν τ Ay 
11 ᾿ς nas Ais, G55: iss eee ee ee 
ee 413 9423 433 Gin "*" Any 


are positive (Reich [1949)). 

Another useful theorem (Collatz [1950]) states that both the Jacobi 
and the Gauss-Seidel iterations will converge if the n x ἢ matrix A has the two 
following properties: 


1 The matrix A does not contain a p x q submatrix of zeros, with 
pP+q=nh. 

2 The magnitude of each diagonal element of A is at least as large as the 
sum of the magnitudes of the other elements in its row and, in at least one 
case, 15 /arger than that sum. 


The first property of the matrix A is sometimes called irreducibility, and the 
second diagonal row dominance. It is important to notice that the two preceding 
conditions are sufficient but often are not necessary. In a wide class of situations 
(see Varga [1962]) the Gauss-Seidel iteration converges whenever the Jacobi 
iteration does and converges more rapidly. Although there are indeed cases 
in which the situation is reversed, the Gauss-Seidel iteration is almost always to 
be preferred. 


564 INTRODUCTION TO NUMERICAL ANALYSIS 


There exist many other numerical techniques for solving sets of linear 
equations (see Sec. 10.20), some of which are direct methods (reductions) which 
would yield the exact solution of a set of m equations in 2 unknowns after a 
finite number of steps if no roundoffs were effected (as is true for the Gauss, 
Gauss-Jordan, and Crout reductions), and others which are basically iterative 
in the sense that generally an infinite sequence of approximations is generated, 
with convergence to the solution in a certain class of situations (as in the Jacobi 
and Gauss-Seidel processes). 

Particular mention should be made of the method of conjugate gradients 
due to Hestenes and Stiefel [1952], which would terminate in the absence of 
roundoff, and of a rather extensive class of other gradient and related methods. 

At the other extreme there exist the so-called relaxation methods, 
apparently invented by Gauss and revived and popularized by Southwell 
[1940, 1946, 1956], in which the rapidity (or existence) of convergence may 
depend upon the ingenuity of the user. In applying a relaxation process to the 
solution of a set of equations of the form 


ἄιιχι + Qy2X2 + Ἔ αι η6Χ, = Cy 
τ τυ τς τὰν ἤει τς τ τῷ δι τς δι τα eae alee τας δ τοη να Α (10.9.5) 
Qni*1 Τ᾽ Ω,,2 Χ2 ΣΡ annXn and Ch 
we first define residuals Κι, R2,..., R, by the equations 
Cy — QyyXy — 4y2X%2 — — AyyXy = Ry 
Sadiciceatcts tts ial, Te atten ate toh nae Seat tee tons δἰ τε eet Colin (10.9.6) 
Ch QnyX14 — Gy2X2 π QnnXn = ΚΑ, 


The unknowns x,, Χμ»...» Χῃ are then estimated, and the corresponding 
residuals are calculated, after which the estimated values of the unknowns 
are to be successively modified (one or more at a time) in such a way that the 
magnitudes of all residuals are eventually reduced effectively to zero. 

Reference to (10.9.6) shows that when x; is increased by unity, and all 
other x’s are held fixed, R, decreases by a;;. Thus the transpose of the coefficient 
matrix 


Qi, Ὧι 3 τ ἄ,ι 
Q12 a22 Q32 δ Qn2 (10 9 7) 
Qin ἄτῃ A3n Ann 


in which the rows and columns of the original matrix are interchanged, serves 
as a relaxation table, in the sense that the successive entries in the kth row of 
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(10.9.7) represent the decreases in the successive residuals which correspond to a 
unit increase in x,. If the original coefficient matrix is symmetric, the matrix 
(10.9.7) is then identical with it. 

The situations which generally are most favorable to this process are those 
in which each of the diagonal elements a,,, @22,..., μη iS large in magnitude 
relative to the other elements in its row and column. For, in such cases, an 
increase in x, of about R,/a,, will nearly reduce R, to zero, but will affect the 
other residuals by relatively small amounts, and subsequent modifications of 
other x’s generally will not seriously nullify the effect of this reduction. 

It may be seen that the Gauss-Seidel iteration consists of determining 
successive corrections in exactly this way since, just before x, 1s modified, the 
residual associated with the kth equation is 


os * ΣΝ ἃ * “τ a ΒΒ 
Ry = ἃ — ἀκιχὶ -- — ἀκ κ--1Χκ--Ἰ QigX, Qe k+1Xk+1 —~ AnnXy 


and the modified x, is defined by the equation 


1 
Xp = —(R, + ayx,) = χε + Ἐξ (10.9.8) 
Ok Ak 


in accordance with (10.9.4). Also, the Jacobi iteration is one in which all entries 
are modified (‘‘relaxed’’) simultaneously according to the formulas 


where the R’s are given by (10.9.6). -For these reasons, Gauss-Seidel iteration 
is sometimes referred to as cyclic relaxation, or cyclic displacement, and Jacobi 
iteration as simultaneous relaxation, or simultaneous displacement. 

The advantages associated with the more general relaxation process 
follow from the fact that the values of the residuals are known at each stage. 
Thus it is possible to focus attention at each stage on the residual of largest 
magnitude and either to reduce its magnitude effectively to zero or to proceed 
otherwise if an alternative procedure appears to be desirable. At the same time, 
the fact that an efficient use of the process in its full generality requires a decision 
after each step, not only places a premium on the ingenuity of the user, but makes 
it useful for large-scale computers only when relatively limited latitude is 
provided for variation in the relaxation technique from step to step. In the two 
special cases just mentioned, the residuals are in fact usually not calculated at 
all since the decision as to what modification is to be effected at each step is made 
in advance independently of the values of the residuals. 
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In this connection, we note the existence of an important iteration closely 
related to the Gauss-Seidel prdcess but in which the residuals are calculated 
and in which (10.9.8) is replaced by the formula 


X =X, + @ δὲ (10.9.9) 
kk 
Here ὦ is a predetermined constant independent of &k but dependent in a rather 
complicated way upon the coefficient matrix of (10.9.5), so chosen that the rate 
of convergence of the sequence of iterates is maximized relative to all other 
values of ὦ, including ὦ = 1. Since it happens that always ὦ > 1 (also 
@ < 2), this process is known as cyclic (or successive) overrelaxation. (See 
Varga [1962] and Young [1971].) 
A typical sequence of relaxations, as applied by hand to the result of first 
rounding all numerical coefficients and right-hand members in (10.3.2) to three 
digits, is included for the purpose of illustrating the more general procedure: 


9.37 3.04 — 2.44 
3.04 6.18 1:22 
— 2.44 1.22 8.44 
9 6 8 
Ax, Ax2 Ax3 Κι R2 R3 
0 0 0 9.23 8.20 3.93 
1 —0.14 5.16 6.37 
1 2.30 3.94 — 2.07 
1 — 0.74 — 2.24 =< 3.29 
— 7.40 — 22.40 — 32.90 x 10-1 
—4 — 17.16 — 17.52 0.86 
—3 — 8.04 1.02 4.52 
-ἸῬ 1.33 4.06 2.08 
1 —1.71 -- 2.12 0.86 
--17.10 — 21.20 8.60 x 10-7 
—4 — 4.94 3.52 13.48 
2 — 0.06 1.08 — 3.40 
— 0.60 10.80 — 34.00 x 10-3 
—4 — 10.36 15.68 — 0.24 
3 — 19.48 — 2.86 — 3.90 
=2 — 0.74 3.22 — 8.78 
—1 — 3.18 4.44 — 0.34 
1 — 6.22 —1.74 — 1.56 
—I 3.15 1.30 — 4,00 


The relaxation table is written down immediately, and columns are provided 
for successive changes in the estimated unknowns and for the successive values 
of the three residuals. Values of the diagonal elements, rounded to the nearest 
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integer, are encircled above the corresponding residuals, for convenience in 
estimating appropriate changes in the x’s. 

Starting arbitrarily with the crude approximation x, = x, = x; = 0, 
the initial residuals are then merely the right-hand members of the given 
equations and are listed in the first row of the calculation. Since the largest 
residual at this stage is Κι, = 9.23, we increase x, by the integer nearest R,/a,, ~ 
R,/9, and so enter a unit in the Ax, column and subtract unity times the first 
row of the relaxation table from the row of residuals. At this stage R; is largest 
in magnitude, and x3 is increased by 1 = 6.37/8, after which a unit increase in 
Χ 1s called for. At this stage, each residual is less than one-half the correspond- 
ing rounded diagonal coefficient, and it is convenient to multiply the residuals 
by a factor of 10. The subsequent changes in the x’s accordingly are then to 
be divided by 10 when all the changes are eventually accumulated. 

The approximate solution at the last stage of the tabulation given is 
x, = 0.897, x. = 0.764, and x3 = 0.615. It may be noticed that, with this 
arrangement of the calculations, the entries in the relaxation table need only 
be multiplied by integers. Also, it is possible to avoid all intermediate roundoff 
without carrying more decimal places than are involved in the given data. In 
particular, the residuals corresponding to the three-place approximations 
obtained at the last stage given would be exactly 0.00315, 0.00130, and —0.00400 
if the given three-digit data were exact. However, it is desirable to accumulate 
the increments in the x’s, from time to time, and then to calculate the correspond- 
ing residuals directly, in order to avoid the propagation of the effects of gross 
errors. 

_ Iterative methods, in general, are appropriate for computer or desk use 
in solving sets of equations principally when the number of equations is rather 
large and when the coefficient matrix is “sparse,” so that it contains relatively 
few nonzero elements. Even in such situations some users of computers tend to 
favor a direct method, particularly when that method takes advantage of the 
sparsity or can be made to do so. 


10.10 Iterative Methods for Nonlinear Equations 


Most of the useful methods for obtaining an approximate real solution of a real 
equation of the form 


f(x) = 0 (10.10.1) 


involve iterative processes in which an initial approximation z) to a desired 
real root x = « is obtained, by rough graphical methods or otherwise, and a 
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certain recurrence relation is used to generate a sequence of successive approx- 
imations Z,,Z2,.--.,Z,»-.. Which converges (in a certain associated class of 
cases) to the limit α. 

One such method is that of successive substitutions, in which (10.10.1) is 
first rewritten in an equivalent form 


x = F(x) (10.0.2) 


and use is then made of the simple recurrence relation 
Zk+1 = F(z,) (10.10.3) 


Generally there are many convenient ways of rewriting (10.10.1) in the form 
(10.10.2), and the convergence or divergence of the sequence of approximations 
to ἃ may depend upon the particular form chosen. 

In order to see why this is so, we first assume that F(x) possesses a con- 
tinuous derivative on the closed interval bounded by « and z, and then notice 
that since 


a = F(a) 
Equation (10.10.3) implies the relation 


α -- Zyry = F(a) — FR) = ( — %)F'(C,) (40.10.4) 


where ἔμ lies between z, and «. If the iteration converges, so that z, — a, then 
also F’(é,) ~ F'(a) ask --- oo. Temporarily excluding the cases when F'(«) = 0 
and F’(a) = +1, we deduce that « — m4, ~ (« — z)F’(a«) and hence also 
that 


a—z,~ ALF (a) (k > 0) (10.10.5) 


where A is a certain constant; and this deviation in fact would grow unboundedly 
in magnitude with increasing k if it were true that |F’(«)| > 1. Thus it appears 
that in order that the iteration converge to x = ἃ as an infinite sequence, it is 
necessary that |F’(«)| < 1. 

If we define the convergence factor p, as the ratio of the error in Z,,, 
to the error in z,, it follows that if z, is near a, then p, ~ F’(«). The number 
F'(«) may be called the asymptotic convergence factor. Unless |F’(a)| Ξ 1, a 
small error in z, is increased in magnitude by the iteration, and we then say that 
the iteration is asymptotically unstable at «. When |F’(«)| > 1, convergence to α 
could occur only in a finite number of steps, in consequence of an improbably 
fortunate choice of the initial approximation Zp (such as Zp) = 4). 

When [δ΄ (α)] = 1, the asymptotic behavior of the corresponding approx- 
imation sequence is unpredictable without further information. Finally, when 
F'(a) = 0, a sufficiently small value of |x — z,| certainly leads to a smaller 
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value of |a — z,4,|, so that asymptotic stability is present, but (10.10.5) no 
longer describes the nature of the convergence when it exists, in this case. 

If |F’(@)| < 1, so that the iteration is asymptotically stable at «, and if the 
initial approximation is sufficiently near to a, the sequence of the iterates will 
indeed converge to «, in such a way that ultimately the successive approximations 
tend toward « from one direction if 0 < F’(«) < 1 and oscillate about « with 
decreasing amplitude if —1 < F’(a) < 0. In the special cases when F'(a) = 0, 
the nature of the convergence depends upon the behavior of the higher derivatives 
of F(x) near x = a. 

In illustration, a simple analysis (or a rough plot) of the function 
y = x*° — x — 1 shows that the real root of the equation 


f(x) =x*-—-x-1=0 (10.106) 


is between x = 1 and x = 2, and is near x = 1.3. This equation can be con- 
veniently written in the form (10.10.2) in various ways, such as x = x° — 1, 
x = I/(x? — 1), and x = (x + 1)'/3. However, only the third (and least 
convenient) of these particular forms is such that the derivative of the right-hand 
member is smaller than unity in absolute value near x = 1.3. Hence, we may use 
the recurrence formula 


Zea1 = (% + 1)” 


and, with z) = 1.3, then obtain the sequence z, = 1.3200, Z, = 1.3238, 
Z3 = 1.3245, z, + zs; + 1.3247, when four decimal places are retained. The 
true root is 


α = 1.3247179573 
to 10 places. 

More generally, for any differentiable function f(x), if an interval [a, δ] 
can be found such that f(a) and f(b) have opposite signs, and if F(x) is of con- 
stant sign in [a, b], then certainly f(x) has one and only one zero x = «@ inside 
La, b]. If the equation f(x) = 0 is rewritten as x = F(x) in sucha way that 


|F'(x)| <M <1 (10.10.7) 


when a < x S ὃ, then assuredly the iteration (10.10.3) is asymptotically stable 
at α. Furthermore, if z) is taken to be inside or at one end of [a, b], it then 
follows from (10.10.4) that 


la — z4| S Mla — zo] < |a — Zo 


so that z, is closer to α than is z). Consequently, one may be led to conclude 
that also |a — z,| < Mla — z,| S$ M?|« — z,| and, by induction, that 
le -- Ζιί S Μῆα — zo], so that z, will necessarily converge to a as k > oo. 
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The flaw in this argument is that there is no certainty that z, in fact will 
fall in [a, δ], so that (10.10.7) is true when x = z,, but only that this will be the 
case when Ζρ is close enough to a. 

An additional condition which clearly is sufficient to ensure that z, and 
all subsequent z,’s remain in [a, δ], and hence that convergence to « will indeed 
follow in the preceding situation, is the requirement that F(x) be such that 
α Ξ F(x) S ὃ for all x such thata S x SB. 

Another sufficient condition for convergence with any choice of Z in 
[a, b], assuming again that « is known to lie in that interval, is that 


0 < F(x) < 1 whena<x<b_ (10.10.8) 
This follows easily from the fact that if z, is in La, δ], then 
Zua1 — 2 = F(%) — % 
= (a — %) — [F@) — F@)] 
= ( — z)[1 — F'()] 


where a < &, < ὃ. Hence, since (10.10.8) guarantees thatO < 1 — F’(¢,) < 1, 
it follows that z,,, — Z,hasthesamesignasa« — z, and has asmaller magnitude. 
Thus z,4, is between z, and « and hence also in [a, 6]. 

In other cases, even though it is established that |F’(a)| < 1, it may be 
difficult to determine in advance whether convergence is ensured for a particular 
Zo; and often one must determine by numerical calculation whether the initial 
approximation is in fact sufficiently good. 

In view of (10.10.5), we may notice that if 0 < |F’(«)| < 1, and if the 
iteration (10.10.3) converges to a, then the relation 


a — z, ~ AB* (10.10.9) 


will be valid for some constants A and β, independent of k, when k is sufficiently 
large. If we rewrite this relation with k replaced by k + 1 and by k + 2, 
and eliminate the unknown A and f from the resultant three relations, we may 
deduce the approximation 

An 5 Od 

a — 2E+1 a — 2 
which yields the estimate 

ZEZk+2 π΄ 7, 4 

Zyt2 —~ 22y41 + Zp 


ant 
or, equivalently, 


" 2 2 
 (ζμι2 = Zr) = Ge con sie (Az n+). (10.10.10) 
Ze42 — 2Zy41 + 2 ᾿ Az, 


aL Ζκι2 
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where 
= Bi δ = 
AZ, = Ζμει — 2 ANZ, = ΔΖμει — AZ = Zy42 — Wea, + Zk 


Thus, if three successive iterates z,, Z,41, and Z,45 are known, this relation 
affords an extrapolation which may be expected to provide an improved estimate 
of «, when the iteration converges. This procedure for accelerating convergence 
is often called Aitken’s ΔΖ process. In the preceding example, with z, = 1.3245, 
Az, = 0.0007, and A’z, = —0.0031, to four places, (10.10.10) yields the 
extrapolation a * 1.3245 + 0.0002 = 1.3247, which happens to agree with z, 
to four places and is correct to those four places. If additional digits had been 
retained in the calculation of the iterates z,, z,, and z3, even though those digits 
were not of apparent importance to the iterates themselves, the approximate 
value of « obtained from them by an extrapolation based on (10.10.10) would 
have been found to be correct to additional places. 

In a wide class of related methods for dealing with (10.10.1), a recurrence 
formula of the type 


Zi = 3 55 Le) (10.10.11) 
Vk 
is used, with a suitable definition of the auxiliary SEQUENCE Yo, Y1,---5 Pye --- 
The relation (10.10.3) can be specialized to (10.10.11) by writing F(x) = 
x — (x)f(x), where ¢(x) is a function such that @(Z,) = 1/y,. It should be 
noticed, however, that the function F(x) — x relevant to the method of succes- 
sive substitutions is not necessarily proportional to f (x), but is required only 
to be a function which vanishes at the required point « for which / vanishes. 
Conversely, the explicit definition of a function $(x) which takes on the chosen 
value 1/y, when x = z, obviously is not necessary in the present case. 
Since f(a) = 0, the recurrence relation (10.10.1 1) implies the relation 


F(a) — f(%) 
Vk 


(a — z,) [ = ᾿ γῷ] (10.10.12) 


α -- Zep, = ἃ - Ze - 


where ¢; is between z, and α. Thus the convergence factor p,, at the kth stage is 
given, to a first approximation, by 1 — [f '()/y,.] when z, is near «, and, unless 
this factor is smaller than unity in magnitude, so that 


0 « 1.10) «2 (10.10.13) 
Vk 


when k is large, convergence of z, to a generally cannot be obtained. 
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FIGURE 10.1 


It is clear that if the z sequence converges, so that z,,, — Zz, 7 0, and 
if y, is bounded as k — oo, there then follows f(z,) — 0, so that z, tends to a 
solution of (10.10.1). In particular, the requirement z,,, = «, where f(a) = 0, 
would imply that 


γι = Io (10.10.14) 
a— Z, 

so that y, would then represent the slope of the secant line joining the points 

P,(z,, f,) and P(a, 0) in Fig. 10.1. Thus it is desirable to define the y sequence 

in such a way that this situation is approximated at each stage of the calculation. 

In the method of false position (regula falsi), the iteration is initiated by 

finding z) and z, such that fo and f, are of opposite signs, and by defining 7, 
as the slope of the secant PpP, (Fig. 10.2), so that 


Fy eg. ἐς AO Ga Tito. = 05: 1146:16.(5) 
fi — fo fi — So 

In each following iteration, y, is taken as the slope of the line joining P, and the 
most recently determined point at which the ordinate differs in sign from that at 
P,. The procedure is seen to be merely iterated linear inverse interpolation and 
is clearly certain to converge, although the rate of convergence may be slow. 
In the case of the preceding example, in which f(x) = x° — x — 1, with the 
starting values z) = 1.3 and z, = 1.4, the next three iterates may be found as 
follows: 


2k Sie 1/7 —Sal Ye 

1.3 — 0.103 — — 

1.4 0.344 0.224 — 0.077 
1.323 — 0.00731 0.219 0.0016 
1.3246 — 0.000503 0.219 0.000110 
1.324710 


NUMERICAL SOLUTION OF EQUATIONS 573 


FIGURE 10.2 


As this example illustrates, the factor y, often changes slowly after the first 
few steps, and the rate of convergence then is not significantly reduced if, from 


‘such a stage onward, γι 1s assigned a constant value. 


In this illustration, the approximation z, was obtained by interpolation 
based on 23 and z,, in accordance with the preceding description of the proce- 
dure. If, instead, the Jast two abscissas available are used, so that here z, is 
obtained by extrapolation based on z3; and Z,, with 1/y, + 0.235, a better 
approximation (1.324718) is obtained. More generally, whereas the systematic 
use of the slope of the secant P,_ ,P, cannot be guaranteed to yield a convergent 
sequence when it requires extrapolation, this modified procedure is usually 
advantageous when it does converge (see Sec. 10.12). It is often called the 
secant method and can be described by the iteration formula 


Ze — “2K-1 hi 
Zz = Z, — ———_ , = 2 ---- ὁ. (10.10.16) 
δ το. 7[2κ--1: 2] 


with the divided-difference notation of Chap.2. 

Another simple modification of the method of false position, with desirable 
properties, can be deduced by applying to it the Aitken acceleration process. 
It is noted first that, except in the special cases when 7 (α) = 0, the false- 
position iteration ultimately has one stationary end point (at which ff” > 0), 
this situation occurring when f” is of constant sign between x = z, and x = 
2,41. [See Prob. 42(a). In the preceding example, Ζ, = 1.4 is stationary. ] 
The sequence of the complementary end points (at which ff” < 0) then tends 
to « monotonically, and the resultant one-sided approach may be slow. 

To simplify the notation, we suppose here that Ζρ is a stationary point and 
that z, is separated from Ζρ by the desired root «, with f"(x) # 0 in [Zp, z,]. 
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The false-position iteration then specializes to the form 


2S Z 
Consequently, there follows also 
Zk+1 —~ 40 Si 
Ze+2 — Zeer = —- = — fh Ζ - Ζι = = (2441 — Zo) 
+ hia ay k+1 k+1 fo κι 


and the introduction of these relations into the Aitken approximation (10.10.10) 
can be expressed in the convenient form 


Zk+1 — 40 
OM Zp Hf, ~(10.10.18) 
fer ~ A feild 
after a little manipulation. The right-hand member of this relation then can be 
used to provide a modified definition of z,,. so that, when & then is replaced 
by k — 1, following approximations to a may be generated by the relation 
Zk 


Ξ τάς 0 k > 2) (10.10.19 
a 


Zet+1 = 4k 
where 
= 1 — 2 (10.10.20) 
k-1 
in place of (10.10.17). 
The process so defined, which could be called an accelerated false-position 
iteration, differs from the classical process when Zg is stationary only in that 


Jo 15 replaced by py, fo in the calculation of Z,,;. 
Calculation shows that if f’(a)f’(a) # 0, there follows 


Ια — Z4,| ~ Ala — z| (10.10.21) 


where the asymptotic convergence factor 1s 
in ae Zo -- OF (10.10.22) 
F (Zo) 


When |Z. — αἱ is small, there follows also 


~ Zo - DI) (10.10.23) 
4 f(a) 


so that A then also is small, as is desirable for rapid convergence. The corres- 
ponding factor in the classical case is given by 1 — [(Ζοὸ — a)/'(a)/f (Zo) |; 
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Ζκ Ζκ 1 


/ P, 


FIGURE 10.3 


from which it follows that when [Ζρ — αἱ is small, the acceleration approximately 
halves the asymptotic convergence factor. 

A particularly simple iterative process, often called the bisection method, 
consists of merely evaluating f(x) at the midpoint x3 of the interval [x1, X2] 
at the ends of which f(x) has opposite signs, discarding that one of x, and x, 
at which the ordinate has the same sign as f(x3) and repeating the process until 
half the length of the subinterval inside which « continues to be trapped is 
within the prescribed error tolerance. This process, in common with that of 
false position, is certain to converge; but here, in addition, the number of steps 
which will suffice is determinate in advance since after k steps the error will be 
smaller than 2." (x, — x,) if the approximation to « at that stage is taken to 
be the abscissa of the midpoint of the remaining kth subinterval. Needless to 
say, the requisite number of steps may well turn out to be intolerably large. 


10.11 The Newton-Raphson Method 


An important method, associated with the names of Newton and Raphson, 
consists of taking y, in (10.10.11) as the slope of the curve y = f (x) at the point 
Zz, (Fig. 10.3), so that (10.10.11) becomes 


_, — 7 
tae Ba τς (10.11.1) 


} A similar process, in which y, is arbitrarily taken to be 2-* (or to be c~* for some c) 
is advocated by Hamming [1971]. Although successive approximations yielded by 
this process frequently approach α rather rapidly for a while, they ultimately will 
exhibit divergence except in very special cases. 
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This iteration is seen to be also the special case of (10.10.3) in which 


F(x) = x -- 7) 
ΤΟ) 
and hence F’(x) = Κα) f’"(x~)/[f'()]’.. Thus, if f’(@) 4 0 and 7"(ο) is finite, 
there follows F’(a) = 0, so that the convergence factor tends to zero when 
and if z, — a. 
| In order to examine the behavior of the error « — z,, we rewrite (10.11.1) 
in the equivalent form 
O— Za, = ὀ ἃ -- Ζκ Κῶ - ἴτω (10.11.2) 
f'(2x) 
and recall that 
Τὼ — f(a) = α — 2S'(a) + a — Σρ( 
where ἔς lies between z, and a, if f’(x) is continuous in that interval, so that 
(10.11.2) becomes 
C= Ζκ- 1 —= —t(o a apo (10.11.3) 


Thus, if the iteration converges to a, there follows 


ὐ ὦ @ 2)? (e+ 6) (10.114) 
2f"(«) 
provided that f’(«) and f”(«) are both finite and nonzero. 

It is important to notice that here the error in z,,, tends to be proportional 
to the square of the error in z,, as k > οὐ, whereas in the other methods so far 
considered the two successive errors generally tend to be in a constant ratio, 
if the iteration converges. We say that such an iteration is a second-order 
process, whereas the preceding methods generally are first-order processes. If 
this method is applied to (10.10.6), the recurrence formula (10.11.1) becomes 


α — Ζκει σ΄ 


ze-z—1 2: τ| 
322-1 8322-1 
and, with z) = 1.3, the results of the first two iterations are z, = 1.325 and 
z, = 1.324718 when rounded to the places given. 
Use can be made of (10.11.4) to predict in advance the probable number 
of correct digits in each iterate. For, since here f"/f’ has a value of about 2 
when x = Z) = 1.3, it may be expected that the coefficient of (« — Ζ.)} in 
(10.11.4) will have a value approximating —1, so that the error ὃς in the Ath 
iterate will be of magnitude approximately the square of that of the preceding 
iterate and will be of negative sign, if the iteration converges toa. If convergence 
is assumed, and if initially it is known that the true value lies between 1.3 and 


Ze+1 = 2k 
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FIGURE 10.4 


1.4 and hence that Zp is in error by less than 0.1, it can be predicted that z, will 
be in error by less than about 0.01, so that three places would be retained. With 
Eq reestimated as z; — Zp) + 0.03, there follows |e,| ~ 10~°. Hence 82 may be | 
expected to be less than about 1075, so that seven places might be retained at 
that stage. A comparison of z, and z, confirms the earlier prediction (although 
this method of error estimation may be undependable in early stages, in other 
cases) and suggests that the error in the next iterate z, will be in about the 
twelfth decimal place. Rigorous error bounds naturally would be obtained by 
use of (10.11.3) rather than the asymptotic formula (10.11.4). 

If the curve representing y = f(x) possesses turning points or inflections 
in the interval between the initial estimate x = z, and the true root x = a, 
or between Zp, and z,, the iteration may not converge to a, as is illustrated in 
Fig. 10.4, although it may well converge to some other root. However, if 
7 ΟἹ and f”(x) do not change sign in the interval (Zp, «), and if f(z,) and f”(z,) 
have the same sign,f so that the iteration is initiated at a point at which the 
curve representing y = f(x) is concave away from the x axis (as, for example, 
in Fig. 10.3), it is easily seen, by geometrical considerations or otherwise, that 
successive iterates must tend to x = « and that they all lie between z, and «. 
If f(Z) and f"(Z) have opposite signs, the first iterate z, is on the opposite side 
of « and convergence to « is uncertain unless f’(x) and f”(x) also do not change 
sign at x = « or in the interval (z,, «), in which case convergence then follows 
as before. 

If α is a zero of multiplicity m, where m > 1, so that 


IM Ξυῶ = =f =0 γὼ #0 (0.11.5) 


t These conditions are associated with Fourier. For less restrictive sets of conditions 
which also ensure convergence, see Henrici [1964], pp. 79-81, and Ostrowski [1966], 
p. 44. 
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the relation (10.11.4) no longer holds, but the Newton-Raphson method then 
can be shown to reduce to a first-order process. In such cases there are modifica- 
tions which restore the order to two. (See Prob. 68.) 

However, situations in which two (or more) zeros are coincident, or 
nearly so, generally are troublesome numerically since successive steps in the 
relevant iterative process then usually involve the evaluation of ratios of 
increasingly small quantities and high-precision computation is required (unless 
cancellation of factors or other simplification, by analytical methods, is possible). 
A method which often is useful in such cases is indicated in Prob. 55. When 
I(x) is a polynomial, an alternative method consists of seeking a quadratic 
factor rather than a linear one (see Secs. 10.18 and 10.19). 

The preceding methods can be combined and modified in various ways. 
In particular, if f’(z,) begins to change slowly with k after (say) r iterations, the 
Newton-Raphson procedure may be modified by taking y, = /’(z,) for all 
k =r or by recalculating the derivative (say) once out of every two or three 
steps. The method of false position may be modified, for example, by taking 
γι as the slope of the secant PoP, for all k, where Py and P, are two fixed points 
on the curve y = f(x) near to and separated by the point P at which x = a, 
or by taking y, as the slope of the secant P)P,, where Py is an appropriately 
chosen fixed point on the curve. Whereas such modifications lead to reductions 
in labor, their use clearly may also adversely affect the nature and the rate of 
convergence. 


10.12 Iterative Methods of Higher Order 


Any iteration in which it is true that 
la — Z44| ~ Ala — z,|" (kK > οὐ (10.12.1) 


with A # 0, when the iteration converges, is said to be a process of order r, 
which need not be an integer. The constant A is called the asymptotic error 
constant. It may be seen that any iterative process of order exceeding unity 
certainly will yield convergence to « if the iteration is initiated sufficiently near to 
a. On the other hand, when r = 1, this statement generally cannot be made 
unless the asymptotic convergence factor associated with « is smaller than 1 
in absolute value. (The methods of false position and bisection are notable 
favorable exceptions.) 

It is of some interest to determine the order of the secant method (that 15, 
the variant of the method of false position in which linear extrapolation or 
interpolation is always based on the two most recently determined ordinates). 
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In this case it is true that 
α - Ζχκ- 1 pa C(a = Z,)(a -- Z,~1) (10.12.2) 


when the iteration converges, where C = —4/f"(a)/f’(«) (see Prob. 69). Assum- 
ing that f’(«) and f"(«) are finite and nonzero, and denoting the unknown order 
of the process by r, there must follow 


a — z,|\'/" 
Ala — z,|" ~ |C|l« — z,| wor 


Thus r must be the positive root of the equation 


r=1+ 
r 
and hence 
fe τ = 1621} (10.12.3). 
Since also 


A= ic" Sic" 


we conclude that here 
»-1 


Pe) ier (0129 


27 () 


ἰδκαἱ ~ | 


where r is given by (10.12.3). 

In addition to the processes so far considered, a great variety of other 
iterative methods for the purpose of approximating the zeros of functions have 
been devised and studied. In particular, families of processes of higher order 
(integral or nonintegral) are available for computation. Although very often it 
is true that the theoretical advantages of such formulas are more than offset by 
the additional time and/or effort per iteration required by their use, this is not 
always the case, Accordingly, a few interrelated examples are considered briefly 
in this section; throughout the section it is assumed that the required zero is not 
repeated, so that f’(a) 4 0. 

As a generalization of the secant method, which employs linear inter- 
polation or extrapolation based on the values of f(x) at two points z, and z,_, 
near the desired zero «, one may be led to investigate a quadratic process based 
on three such values. The second-degree polynomial y(x) agreeing with f(x) 
at the points z,, z,_,;, and z,_, could be expressed in the lagrangian form 


t The reciprocal of this number is often called the golden mean. 
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(3.2.12) or in the newtonian divided-difference form (2.5.2). For present pur- 
poses we choose the latter form 


γα) Ξ + & -- 2)f 2p Ζι-..1- αἱ -- A) -- Z4—1)F Zs Ζι-- 1: Z,-2] 


(10.12.5) 
where 


fl 2-1] = Je = Sunt 


ZR — “K-14 
flZ Ze Z,-2] = 712ι: Ζι.-.4] ἘΞ ΤΠ Ζεε Ζι..2} (10.12.6) 
Ζι — Ζκ--2 
With the convenient abbreviations 
λε = Sa μι — ΠξΖι Ζι-"» Ζι- »ἱ (10.12.7) 


Ox OO, 
where 


{{2ι. Ze-1] + (Ze — 2e-DS L2e> Ζι--1» Ze-2] 
O, = ne eee f._,) 1 Le Ze-1> Ζκ-.] (10.12.8) 
fl Zn Ζι- ι] 
the equation y(x) = 0 takes the form 

μια -- zz)? +(x-2z) +4 =90 (10.12.9) 

and the identification of z,,, with the proper root of this equation gives 

2A, 
LA = 4a), 

Here the ambiguous sign was chosen so that z,,,; = 2, when f(z,) = 0. 
This formula is equivalent to a somewhat more complicated one obtained 


by Muller using the lagrangian form of y(x), which has found some favor in 
practice. For this iteration it is found that 


(10.12.10) 


Zeta = 2k π 


δκ- 1 [~ i se (10.12.11) 


Of"(a) 
and, accordingly, that 
Mm Oo (r—1)/2 
rane an le," (10.12.12) 
where r is the real root of the equation 
r=1+-+-—5 


and hence 
r = 1.84 (10.12.13) 
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It may be noted that despite the relative complexity of the Muller iteration, its 
order is inferior to that of the Newton-Raphson iteration. In addition, its 
use requires a square-root extraction in each step and hence also may provide 
a nonreal approximation to a real zero. However, the fact that the value of the 
derivative f’(z,) 1s not needed in the Ath step affords a computational advantage 
in those cases when the evaluation of f’(z,) is undesirable or impossible. 

Now, just as the Newton-Raphson formula can be considered as the 
“tangent formula” obtained by confluence of the two data points for the secant 
formula, new formulas can be derived from (10.12.10) by letting either z,_, 
or both z,_, and z,_, tend to z, in (10.12.10). In either case there follows 


ὧς = f [2 2] = Si 


Thus, in the former case, if we relabel the distinct abscissa as z,_,, the 
iteration is again given by (10.12.10), where now 
4, -ἶ = fe 1) (10.12.14) 
Si (2, — Ζι--1).7}. 
This process corresponds to interpolating or extrapolating from z, along a 
parabola which is tangent to the curve at (z,, f,) and which also passes through 
the point (z,_1,/,-1). It is found that here 


un ol 
REA τ ae ἄρ (10.12.15) 
and accordingly that 
uw (r—1)/2 
aad ~ LOI iar aos2s6 


where 
r=1+2 = 2.41 (10.12.17) 


In the confluent case when z,_, and z,_, are both made to coincide with 
Z,, the iteration (10.12.10) becomes 


CARN ee a eee oe (10.12.18) 
Pt 2 ΠΣ 1} Ὁ 

and it is found that 

6f"(a) 

so that the order is 3. Here the curve representing y(x) and that representing 

J (x) agree when x = z, and also have the same slope and the same curvature 

at the point (z,, /,). 


εἰ (10.12.19) 


e414 ~~ 7 
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From these formulas we may derive more simple ones by introducing 
approximations. For example, if (for small f,) we introduce the approximation 


ji 248 «οὐ 
ΤῊΝ ΤῊΝ 


in (10.12.18), we obtain the formula 


= -- Sif 10.12.20 
fee = Fe Tg peg? 


ὌΠ oO I Ot (10.12.21) 
= 2..] ὁ! * 


which is also of third order, but which does not require a square-root extraction. 
This is the frequently rediscovered formula of Halley (see footnote on page 513). 
Iterative approximation based on this formula is also sometimes called Bailey’s 
method, or Lambert’s method. 

If we write 


for which 


i: ah ‘4 ΠΕ: 
ΟΣ Fi) 


in Halley’s formula, we obtain the iteration 


a ἃ Sih 
a eae ae 7; ! + | (10.12.22) 


for which 


2f"(a) 6f"(a) 


This is the third-order iteration supplied by the result of Prob. 60 and sometimes 
called Chebyshev’s formula. 

As a final illustration, which is also of practical significance, we start 
with (10.12.10), notice that A, will be small when /, is small, and hence approx- 
imate (1 — 4y,A,)'/* by 1 — 2y,A,, as was done in deriving (10.12.20). The 
resultant formula is 


δι 1 ™ ! Beall - a εἰ (10.12.23) 


1 — wyA;, 


Ze+1 = 2% 


..:--.--. eI S19 199) 
ὡκ — (fil ODS (Zs Ze-19 Ze-2] 
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If we replace w, in its first appearance here by the second form of its definition 
(10.12.8) and replace it in its second (less significant) appearance by its approx- 
imation f[Z,, Z,—,], the formula (10.12.24) takes the form 


ee eee: | ere - ἠδιδῦὲ 
va ᾿ FL Zn. Zx-1] — (ἀ.- 72» Ζι- 1.1} 72» Ζχ--15 Ζι..2] ' ᾿ 


for which 
m (r—1)/2 
I") leg!” (10.12.26) 


6 f(a) 


δι εαἱ ~ | 


where 
r = 1.84 (10.12.27) 


This formula, due to Traub [1964], can be considered, not only as a 
simplified approximation to the Muller formula, but also as a divided-difference 
simulation to the Halley formula. It has the virtue that its order and asymptotic 
error coefficient are the same as those of Muller’s formula but that it does not 
require a root extraction. (The inviting replacement of f,_, by f, in Traub’s 
formula, however, would in fact reduce the order.) In comparison with the 
Halley formula, we see that the replacement of derivatives by divided differences 
does cost an order reduction from 3 to about 1.84. 


10.13 Sets of Nonlinear Equations 


Some of the methods in preceding sections are readily generalized to the treat- 
ment of two or more simultaneous nonlinear equations (algebraic or trans- 
cendental). Thus, for example, the two simultaneous equations 


fo y)=0 g(x,y) =0 (10.13.1) 
can be written (in various ways) in equivalent forms 
x = F(x, y) y = σία, y) (10.13.2) 


and the method of successive substitutions can be based on the recurrence 
formulas 


Χκει = F (xy, Vy) VYer1 = σῷ, Vy) (10.13.3) 


When the iteration converges to the true solution pair, say, x = a and 
y = B, it can be shown that the errors in the kth iterates tend to be described 
by the relations 
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where A,, A>, B;, and B, are constants, independent of k, and where 4, and 
A, are the roots of the equation 


Fe ce ων 


or 
λδ-. (F, + G,)A + (F,G, — F,G,) = 0 (0.13.2 


with the partial derivatives evaluated at (a, β), if F,G, # F,G, at that point. 
The constants A,, B, and A,, B, will be conjugate complex if the same is true 
of 2,, 4,. Thus the iteration will be asymptotically stable at (a, βὲ) if and only if 
the roots 4, and A, are smaller than unity in absolute value, the necessary and 
sufficient conditions for which are 


IF, + G| < FG, -— FG, +1<2 (0.13.5) 


A more stringent pair of conditions, which is sufficient (but generally not neces- 
sary) for asymptotic stability, is of the form 


FI+IRL <1 |G) +1G) <1 (1013.6) 


As before, these conditions are not sufficient for convergence, in that the iteration 
may fail to converge even though they are satisfied, unless the iteration is 
started with (Xo, Yo) sufficiently near (a, f). 

The preceding discussion generalizes in the obvious way to the case when 
n simultaneous equations are involved, with n = 3, except that no simple 
generalization of (10.13.5) to m dimensions 15 available. 

The Newton-Raphson iteration, as applied to the solution of (10.13.1), is 
based on the result of replacing (a, B) by (Χκ 1. ¥,4 1) in the right-hand members 
of the Taylor expansions 


0 = f(a, B) = f(x; Vr) ὯΝ (α = χρ κι. Vr) + (B ~ Vi Fy Ors Yr) τὴ 
0 = σία, β) = φίχ,» γὼ + (α -- Xp) σκίχ,, γὼ Ὁ (β -- γρ να. Ve) Ἔ °°" 
(10.13.7) 
and neglecting nonlinear terms in x,4,; — X, and γὼ... — }μ» So that the recur- 
rence formulas are of the form 
(κε. — XDA Oe γὼ + Nee -- VadS (Xn Vy) = FO Vd (10.13.8) 
(κει -- χρ)σκίχ, Ve) + Nets — VI y(Xrs Vd = —G9 Xn Vd 


Rather than resolve these equations for x,,, and y,,,, it is usually convenient 
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to solve them, as written, for the corrections Ax, = X,4, — x, and Ay, = 
γκαι — Ye, Which are to be added to x, and y, to yield the following iterates. 
When the iteration converges, the errors in the (k + 1)th iterates generally tend 
to become linear combinations of the squares and products of the errors in the 
kth iterates (that is, the iteration is a second-order process), whereas, in the 
method of successive substitutions, based on (10.13.3), the new errors generally 
tend to become linear combinations of the preceding errors themselves. 

The generalization to n dimensions is obvious. In the two-dimensional 
case considered here, it can be seen that the Newton-Raphson process amounts 
to approximating the surfaces representing the relations 


z = f(x, y) z= g(x,y) (10.13.9) 


in three-dimensional space by their tangent planes at (Χμ, y,, f,) and (Xs Viv σι)» 
respectively, determining the common intersection (if it exists) of these planes 
with the plane z = 0, and finally identifying (4,41, ¥;,41) with the x and y 
coordinates of that point. 

When the so-called Jacobian determinant of f and g 


Oe a: 

Ox oy 
J=J(fg9)= (10.13.10) 
g ὃ9 
Ox oy 


vanishes at the point (x,, y,), Eqs. (10.13.8) do not possess a unique solution. 
In this case, the lines in which the relevant tangent planes intersect the xy plane 
are either parallel or coincident. More generally, if J vanishes at or near a 
point («, β) at which the curves f = 0 and g = 0 intersect, and if the higher 
derivatives of f and g exist in that neighborhood, then either the curves are 
tangent at their intersection or they possess, in fact, two (or more) nearly 
coincident intersections. In either case, a particularly unfavorable behavior of 
the Newton-Raphson sequence is to be anticipated. 

One method of dealing with such a situation consists of first determining 
the common solution of the equations 


f=0 ν( 9) --0 (10.13.11) 


(or οἵ 9 = 0 and J = 0) near the desired point (α, B) by Newton-Raphson 
iteration or otherwise. The difficulty just discussed generally will not occur 
also in this iteration unless (a, 8) is one of three or more coincident or nearly 
coincident zeros, in which case further modifications are needed. If the solution 
of (10.13.11) is denoted by (x, y), and if f(x, y) and g(x, y) are expanded in 
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powers of x — x and y — jy, the conditions f(«, β) = 0 and g(a, β) = 0 then 
become 


(a — x)f. + (β -- DF, + Ha -- ως. + ἃ — XB -- hry 


+ 4(B — y)*f,, ἘΞ 0 (10.13.12) 
and 
G + (a — χ)δι + (B — HG, + 4 — X) Gx 
+ (a — XB — V)Gry + 4B — ¥)'Gy τ Ξ 0 (10.13.13) 


where f, f,,... abbreviate f(x, y), f.(X, y),..., and where account has been 
taken of the fact that f = 0. 

Equations (10.13.12) and (10.13.13) can be simplified by use of the fact 
that the condition ν( 3 g) = 0 at (X, y) implies the equality of the ratios g,/f, 
and g,/f,, so that 

5. =kf, Gy = kf, (10.13.14) 

for some value of k. Thus the terms involving g, and g, in (10.13.13) can be 
eliminated by subtracting k times (10.13.12) from (10.13.13); and hence if 
only second-degree terms are retained in the results, we obtain the two approx- 
imate relations 


(a — x)f, + (B -- Df, + 4 — αν + ἃ — YB -- fy 
+ 4(B — y)*f,, & Ὁ (10.13.15) 
and | 
a(a — x)? + 2b(a — x)(B — y) + c(B — 9)? ας —2g (10.13.16) 
where 
a= Gx. — Khe ὃ = Gx, — Κίῳ c=, — kf, (10.13.17) 
Finally, if we write 


B -—y = λί( — x) (10.13.18) 
we can recast (10.13.15) and (10.13.16) in the forms 


ν fz MXP af τ F,) (10.13.19) 


aw = 
VE 2hy 
ΒΟΥ. 1/2 
a Cc 


which generally are convenient for use of a process of successive substitutions 
for the determination of « and 4 at each of two points, after which the corre- 
sponding values of β are given by (10.13.18). The approximations to («,, B,) 
and (a, B,) so obtained then generally will be sufficiently good to be suitable 
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for further refinement by a conventional method based on the original pair of 
equations if this is desirable. This method appears to be due to Milne [1949]. 

In particularly difficult situations, when the two simultaneous equations of 
(10.13.1) are to be solved, it is possible to make use of either the false-position 
or the bisection method. For this purpose, suppose that the equation f(x, y) = 0 
determines y as a continuous function y,(x) near (a, 8) and the equation 
g(x, y) = Ὁ determines y as y,(x). Then, if x, and x, can be determined such 
that (x) = y,(x) — y,(x) has opposite signs at these two points, the iterative 
procedures just mentioned are applicable to @(x) with guaranteed convergence. 

However, the probable slowness of the convergence here is coupled with 
the fact that generally the functions y r(x) and y,(x) cannot be obtained explicitly. 
Thus the equations f(x,, y) = Ὁ and g(x,, vy) = 0 will have to be solved numerically 
for y,(x,) and y,(x,), respectively, at each stage of the iteration by use (say) of 
one of the preceding one-dimensional methods. Clearly, it may be desirable or 
necessary in a specific case to interchange the roles of x and y in the preceding 
description. 

Although the other methods considered in this section generalize directly 
to higher dimensions, unfortunately it is true that no method completely 
analogous to the false-position or bisection methods exist in n-dimensional space 
when n 2 3. However, there does exist a usually convergent method known as 
the method of steepest descent. It is described here for the two-dimensional case, 
but the generalization to n dimensions is immediate. If a solution (a, B) of the 
simultaneous equations 

I(x y)=90 g(x, y) = 0 (10.13.21) 


is desired, we first form the function (x, y) such {μαι} 


φ = 4(f7 +g’) (10.13.22) 
The desired solution then clearly is specified by a point at which d is minimized. 
We recall next that the gradient vector Vd, with components {Px, Py}, 
has the property that at a point in the xy plane it is normal to the curve o = 
constant which passes through that point and, in addition, that its direction at 
that point is the direction along which ¢@ changes most rapidly with distance 
from that point. Thus, if (x,, y,) is the current approximation to (a, B), we are 
led to define the next approximation in such a way that 


Xp41 = χὰ Ἔ tu, Ve+1 = Ve i CV, (10.13.23) 
where 


un = PAX, Vid y= Py (Xs Vx) (10.13.24) 
so that the change {Ax,, Ay,} is in the direction of most rapid variation of φ. 


ἡ Other definitions of ¢ can also be used. 
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The amount of change is then determined in such a way that 


o(x, + tu, y, + tv,) = min (10.13.25) 


and hence ¢ is to be determined by the equation 


< $x + tly γε + ἰοὺ = 0 (10.13.26) 


after which (%;41, Ve+1) 18 known. 

A somewhat simpler procedure consists of taking uy = 1, v9 = 0, in the 
first step; then μη = 0,v, = 1, in the second; thenuw, = 1, v, = 0, in the third 
—and so forth—in (10.13.23) so that in each step only one of the components of 
the current approximation is modified. When κα unknowns are involved, one 
may modify the components cyclically (in analogy to the Gauss-Seidel iteration) 
or at each stage one may modify the particular component of the current 
approximation which corresponds to the Jargest component of the current 
gradient vector (in partial analogy to the general relaxation process). 

As might be anticipated, the convergence may be adversely affected by 
the presence of critical points where the Jacobian of fand g vanishes and by the 
fact that the direction of most rapid change of the function @ at a point un- 
fortunately may differ significantly from the direction from that point to the 
desired minimal point. In addition, the computation per step may be excessive 
since the solution of (10.13.26) generally must be effected by an iterative method 
at each stage of the process. 


10.14 Iterated Synthetic Division of Polynomials. Lin’s Method 


When f(x) is a polynomial of degree n, so that the equation to be solved is an 
algebraic one, methods such as those of preceding sections can be systematized 
by the use of synthetic division. For this purpose, suppose that 


f(x) = x” tay tere st Ayn—1X + Ay (10.14.1) 
and, first, let f(x) be divided by the linear expression x — z, so that 
f(x) = x” + ax”) ++++ 4+ a,x + 4, 
= (x — z(t + Dy? +++ δ,--)χ τ δ... + αὶ (60.142) 


where x"~1 - τ: b,_, represents the quotient, and R is the constant 
remainder. Here the coefficients b,,..., 5,-, and the remainder R depend 
upon z. By setting x = z in (10.14.2), it follows, in particular, that 


R=/f(z) (0.14.3) 
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If now the quotient in (10.14.2) is again divided by x — z, so that 
xT + διχη 5 +s + box + δ... 
= (x — 2)" 2 Ὁ οἰχλ ὁ ἘΠ + ο,-.Χ + G2) + R’ (10.14.4 
and hence 
F(x) =  α — 2:2 + yx tee +2) + (x — ZR’ +R 
there follows also 
R' - Σ; (2 (10.14.5) 


and, indeed, if the process is repeated k times, it is easily seen that the remainder 
R® is then f(z)/k!. 

The method of synthetic division, often known as Horner’s method, is 
based on the fact that, by equating coefficients of x"~1, x"~?,..., x, and 1 in 
the two members of (10.14.2), we obtain the relations 


a,=b6,-2Zz a,=b, -- 2b, 
Gn-1 = bn -1 " Zb,-2 a, = R—- ΖΡ, — 4 
Thus, if we introduce the recurrence formula 


δι = a ΖΡ... 1 (A = 1,2,...,n) (40.14.6) 
with 
bo = 1 (10.14.7) 


it follows that this formula will generate the coefficients of the quotient of 
(10.14.2) with k = 1, 2,...,” — 1, and also that 


R = f(z) = ὃ, = a, + ab,-, (10.14.8) 


Further, the c’s in (10.14.4) are related, for k = 1, 2,..., — 2, to the b’s as 
the b’s are related to the a’s, and there follows also 


Κ' = f"(Z) = Cy-y = by-y + 2,~2 (10.14.9) 


For desk calculation, it is convenient to arrange the entries in parallel 
columns (or rows), in the form 


] | 1 
a; δι Ct 
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so that each element is obtained by adding to its left-hand neighbor z times its 
upward neighbor. 

Thus, if the roots of the algebraic equation f(x) = Ὁ are x = α,, G,..., 
a,, and if the Newton-Raphson procedure is to be used to approximate one of 
the roots, starting with an initial approximation z, the next approximation 
z* is given simply byt 


az - (10.14.10) 


and the process then can be repeated with z replaced by z*. This method of 
computation tends to minimize the labor involved in evaluating the polynomials 
f(z) and f(z) and is sometimes known as the Birge-Vieta method. 

In the simple case of the cubic equation (10.10.6), for which 


f(xy) =x --χ --ἰ 


the first two iterations (starting with z = 1.3) would be tabulated as follows: 


z= 1.3 1.325 

1 1 1 1 1 

0 1.3 2.6 1.325 2.65 
—1 0.69 4.07 0.755625 4.267 
—1 — 0.103 0.001203 
Az = 0.025 — 0.000282 


The approximation obtained at this stage is thus 1.324718, in accordance with 
the results obtained in the preceding section. 

Once the iteration is terminated, so that one zero of f(x) is approximated 
and the last entry in the b column is effectively reduced to zero, the remaining 
entries in the b column are (approximately) the coefficients of the reduced 
polynomial, of degree ἡ — 1, whose zeros are the remaining zeros of f(x). 
Thus the remaining zeros can be obtained by solving equations of successively 
decreasing degree. Because of the errors propagated into the coefficients of those 
equations, however, it is desirable to use each zero so obtained as the starting 
value in a final correction run employing the coefficients of the original nth- 
degree polynomial. 


+ Here and henceforth, in dealing with polynomials, we minimize the number of 
indices used in a generic iteration formula by writing z for a typical member of an 
approximation sequence and z* for the following member. Thus, if (10.14.10) were 
to be used to approximate the root α,, it would abbreviate a more specific formula 
such as 

Ry 


, 
Rp 


Ζν,, καὶ = 27,k 
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Other iterative methods involving the evaluation of derivatives, such as 
those based on (10.12.18), (10.12.20), and (10.12.22), can be systematized 
similarly when applied to algebraic equations. 

A particularly simple procedure, due to S. N. Lin [1941, 1943], is based 
on the fact that, by virtue of (10.14.8), the condition f(z) = 0 is equivalent to 
the condition a, + zb,_, = 0. That is, if and only if the assumed value of z 
were a root of f(x) = 0, then the corresponding value of b,_, (which depends 
upon Zz) would be such that z = —a,/b,_,(z). Lin’s iteration is the result of 
applying the method of successive substitutions to the equation written in this 
form, so that the revised estimate z* is defined by the formula 


I 
| 


z* 7 (10.14.11) 


n-1 
and hence 
eee ne + 2b,-; 
δ..-- 
or, equivalently, by virtue of (10.14.8), 
χ᾽ -- 7—_ 40.14.12) 
by~1 


In this method, the formation of the c column is avoided, so that the labor per 
iteration is reduced by nearly one-half. However, if this method is applied to the 
example treated above, the first three iterations may be obtained as follows: 


z= 1.3 1.45 0.91 —5.8 . 
1 1 1 1 
0 1.3 1.45 0.91 

—1 0.69 1.102 —0.172 

—1 — 0.103 0.598 — 1.157 

Az = 0.15 — 0.54 — 6.7 


Clearly, the iteration is not convergent in this case. 
In order to investigate the Lin procedure more closely, we may notice 
that, since (10.14.8) gives 


by (6) = 55) Ξ ὁ. 


the recurrence relation written in the form (10.14.11) can also be put in the form 


e AnZ 


7ῶ — 4, 


592 INTRODUCTION TO NUMERICAL ANALYSIS 


Thus Lin’s method is equivalent to applying the method of successive sub- 
stitutions to the result of writing f(x) = 0 in the form 


x = ——2* -- F(x) (10.14.13) 


70) — 4, 

In the example just considered, (10.14.13) becomes x = 1/(x? — 1), 
which, as was seen in Sec. 10.10, is not suitable for successive substitutions 
since the convergence factor F’(x) has a value of about —5 near the real root, 
whereas, for convergence, its absolute value should be smaller than unity. 
In confirmation, we may notice that the error in z* = 1.45 is indeed about five 
times the error in z = 1.3 and is of opposite sign. 

More generally, we find from (10.14.13) that 


F'(x) a, xf '(x) — f(x) + a, 
: L(x) " α,1 


and hence, at a zero a, of f(x), Lin’s method possesses the asymptotic con- 
vergence factor 


ed f'(@) 
p,= F’@) =1+—“f/@) =14+4, (10.14.14) 
a, 7(0) 
Thus the result of applying Lin’s iteration to a good approximation to α, will 
lead to a poorer one unless |p,| < 1; that is, unless the condition 


lo| = ! + * γα))] «1 (10.14.15) 
a 


n 


is satisfied, the iteration generally will not converge to «,. 

This criterion is a useful one if a rough approximation to a, is known 
initially, unless f’(x) varies rapidly near x = α,. If we recall that a, = 
(—1)"a,0,°*:a, and that f’(«,) = (a, — αι} (α, — a), where the factor 
(x, — a,) is to be omitted, we may deduce that (10.14.15) can also be expressed 


in the form 
lpr = ! ΞΞ (2 = a) 1 = S).--(1 7 *)| 
| Oy α2 Ol, 


in terms of the remaining roots of f(x) = 0. 
In the case of the equation 


<1 (10.14.16) 


x* — χ᾽ + 23x? + 16x — 50 = 0 (10.14.17) 
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a real root 1s easily seen to lie between x = 1 and x = 2. If Lin’s iteration is 
used, starting with z = 1.5, the results of the first three iterations are as follows: 


z= 1.5 1.39 1.421 1.4125 
1 1 1 1 
—8 — 6.5 — 6.61 — 6.579 
23 13.25 13.8121 13.6512 
16 35.875 35.1988 35.3984 
— 50 3.8125 — 1.0737 0.3011 
Az = —0.11 0.031 — 0.0085 


The true roots of (10.14.17) are + J2and4 + 3i. The rate of convergence of the 
Lin iteration, in this case, might have been predicted in advance by approx- 
imating «, by 1.5 in (10.14.14) to obtain p ~ —4. The asymptotic convergence 
factor is —0.25 to two places.f 

It is of interest to notice that since 


Bin, cop ee ΠΟ. ΕΥ0) 


Ζ Ζ 


it follows that b,_, is the slope of the secant joining the ordinate at x = 0 
and the ordinate at x = z. Thus, (10.14.12) is equivalent to the result of taking 
y as the slope of that secant in the more general recurrence relation (10.10.11), 
and the Lin iteration therefore amounts to determining z* by linear interpolation 
(or extrapolation) based on the fixed ordinate f(0) and the most recently cal- 
culated ordinate f(z) (see Fig. 10.5). Also the requirement (10.14.15) is easily 
interpreted as demanding that the ratio of the slope of the curve at P to the slope — 
of the secant P)P be positive and less than 2. 

From this fact it may be deduced that when Lin’s iteration is unstable for 
the determination of a zero a, of a polynomial f(x), stability can be attained by 
translating the origin to a new point x = ὁ if that point is sufficiently near to α,. 
Clearly, the process then would amount to using f(c) as the fixed ordinate in 
place of f(0) in Fig. 10.5. For example, if the origin is translated to c = 1.3 
in the case of the previously considered equation x* — x — 1 = 0, the Lin 
iteration becomes convergent when initiated at that point (see Prob. 90). 

A simple alternative method for introducing or improving the asymptotic 
stability of the Lin iteration at a required zero a,, when a fair approximation to 
a, is known in advance, is easily devised. For this purpose, we note that if the 


+t When the ratio of successive values of Az, becomes nearly constant, that ratio 
serves as an estimate of p, and a generally improved value of Az, then is given by 
(Az,)(1 + p + p? ++++) = (Az,)/(1 — ρ). 
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FIGURE 10.5 


Lin formula (10.14.12) is modified by the insertion of a parameter 4 into the form 


in 7 ae | 


(10.14.18) 


n-1 


the asymptotic Lin convergence factor (10.14.14) at «, is replaced by the factor 
pp = 1+ λν,1 (ὦ (10.14.19) 
700) 


If α, were known exactly, 4 generally could be determined so that p, = 0, 
and the iteration process in fact then would be of higher order. Instead, we may 
replace a, by a preliminary approximation ἃ, and accordingly introduce the 
definition 


a= —L£0_ (10.14.20) 
ἄ, 7 '(a,) 


into (10.14.18). 
In illustration, we again attempt the determination of the real root of 
Eq. (10.10.6): 
x—-x-1=0 


With ἃ = 1.3, (10.14.20) gives 4 + 0.189, so that the modified Lin formula is 


z* = z — 0.189 A 


n—-1 
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and the first three iterations yield the following results: 


z= 1.3 1.328 1.3245 1.324733 
1 1 1 1 
0 1.3 1.328 1.3245 

-1 0.69 0.7636 0.75430 

—1 — 0.103 0.0141 — 0.000930 

Az = 0.028 — 0.0035 0.000233 


An even simpler iterative procedure would replace (10.14.12) by the 
formula 


z= 2 -- 5 (10.14.21) 


7.(ἃ)) 
as a stationary simulation to the Newton-Raphson iteration, where the constant 
J'(%) again is precalculated as an approximation to f’(a,) and is not to be 
modified from step to step. 

Both of these iterations generally are of first order, but they simulate 
second-order processes for a limited number of steps and require only about 
half as much computation per step as does the Newton-Raphson (Birge-Vieta) 
iteration. Ultimately, if the iterations were prolonged, both would become 
inferior to second-order processes. 

The preceding methods are valid, in principle, for the determination of 
complex roots as well as real ones (see Prob. 86). However, since a real initial 
approximation leads necessarily to real iterates, when the coefficients are real, 
the process then must be initiated with a nonreal initial estimate, and operations 
with complex numbers are involved in each step of the process. When the 
coefficients are real, the complex roots occur in conjugate pairs, and it is generally 
preferable to exploit this fact by seeking quadratic real factors rather than linear 
complex ones. A generalized method of synthetic division for this purpose is 
considered in Secs. 10.18 and 10.19. 


10.15 Determinacy of Zeros of Polynomials 


In those cases when the coefficients a,,..., a, of f(x) are inexact, it is desirable 
to have estimates of the corresponding inherent possible errors in the roots of 
J(x) = 0. If a, is obtained as a root of the equation 


)(αὴ = αὐ + aot) 4 -++ 4 a,_1a, +a, = 0 (10.15.1) 


whereas the true coefficients area, + 6a,,..., a, + δα,» then the corresponding 
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true root «, + δα, must satisfy the equation 
(a, + da,)" + (a, + 6a,)(a, + δα.) + °°: 
+ (a,-1 + 6a,-1)(@, + 60,) + (a, + δα.) = 0 (10.15.2) 


If the first equation is subtracted from the second, and if it is assumed that the 
relative errors are sufficiently small to permit neglect of higher-order terms, it 
follows that, to a first approximation, δα, must satisfy the equation 


[not 1 + (n — 1)α, αὐ 2 +--+ + a,_,] δα, 
+ a1 6a, + αἰ 2 δα; +++: + δα, = 0 
and hence 


n-1 n-2 τῶν 
δα, ὅ᾿ _% Oa, + OF "Od, Ἔ΄"" + δα, (10.15.3) 


f'(a,) 
In particular, if each coefficient is known to be in error by no more than 8, 
loa,| Ξ 8 G@=1,2,...,m) (10.15.4) 
there follows, within the same degree of approximation, 


1 2 se ed n-1 
Becton, oo Lt ed + a? oo + Ia 
IF'(a,)| 
or 
la!" — I 


(lal — DIF 


when |a,| 4 1. If |a,| = 1, this approximate bound becomes neé/|f"(q,)|. 
Clearly, unless the right-hand member of (10.15.3) or (10.15.5) is sufficiently 
small to be consistent with the neglect of higher-order terms in its derivation, 
these results should be regarded with suspicion. 

In the case of the real root of (10.10.6), it is found that errors of magnitude 
8 in the coefficients would correspond to a maximum error of very nearly the 
same magnitude in the root if 8 is small. In the case of the root « = /2 of 
(10.14.17), the maximum error in the approximation to the root is found to be 
about one-sixth of the maximum error in the coefficients. 

In terms of relative errors, (10.15.3) yields the approximation 


50, max ἢ, (10.15.5) 


OG ig. Sa % (10.15.6) 
α, k=1 ἀκ 
where 
n-k-1 
c, πὰ {αὶ (1015. 
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Hence, in particular, if the magnitude of the relative error in each coefficient 
does not exceed ἡ, 


<n (10.15.8) 


i 


ΕἸ 


fori = 0, 1,..., 7, there follows 
δα, 


r 


ὦ ἢ 2 Ic, (40.15.9) 


max 


Unpleasant situations exist in which small changes in certain coefficients 
of a polynomial may lead to large changes in certain of its zeros. A well-known 
example of Wilkinson [1959] involves the equation 


70) 


(x ΠΧ 2. (x + 20) 
= x79 + 210χ᾽ +--+ + 20! =0 (10.15.10) 


Here it happens that if the coefficient a, = 210 is changed by 2. 25 = 1.2 x 
10-7, the changes in the smaller roots are slight but, for example, the root Oo 
becomes — 20.8 and the roots «,, and «,, become an imaginary root pair— 
approximately —16.7 + 2.87. In fact, a total of five pairs of zeros become 
imaginary. 

The relation (10.15.3) would in fact admit the possibility of considerably 
larger errors in this case but correspondingly would be useless for estimating 
them. In such a situation, the term ill] conditioned is sometimes applied to the 
polynomial, just as it is applied to a set of linear equations with a similar sort of 
instability. 

Even when the coefficients are exact, or nearly so, the process of deflation, 
in which a linear factor x — « is extracted from a polynomial f(x) to yield a 
polynomial g(x) of reduced degree, will introduce errors into the coefficients of 
g(x) if « is inexactly known and a corresponding small residual remainder is 
ignored. As pointed out in Sec. 10.14, error propagation naturally also takes 
place in self-deflating iteration processes such as those of Birge-Vieta and Lin. 
In all cases, it is highly desirable that approximate roots obtained from g(x) 
or from the results of subsequent deflations (by any method) be given final 
corrections by a method which uses the coefficients of the original polynomial. 

Usually it is desirable to extract factors corresponding to zeros of suc- 
cessively increasing magnitude, in order to minimize the generated relative 
errors. However, this procedure may be inconsistent with the desirability of 
beginning with the calculation of any zero or zeros for which the polynomial is 
ill-conditioned (see Probs. 102 and 103). 
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10.16 Bernoulli’s Iteration 


A method, originally due to Daniel Bernoulli, for obtaining roots of the algebraic 
equation 
x" + ax") ++++ 4 a,1x +a, ΞΞ 0 (10.16.1) 


is based on the related recurrence formula 
My Ἔ ἀμμκ-α Ht + Ay ἀμκ- πεῖ + Anlye-n = Ὁ (10.16.2) 


having the same coefficients as (10.16.1). 
If the roots of (10.16.1) are αι, «2,..., %,, and if (10.16.2) is considered 
as a difference equation, its general solution is found to bet 


μκ = C04 + Coa + C305 eae Cin (10.16.3) 


where the n C’s are constants, independent of k, which are determined by the 
values of μι, H2,..., and y,, if no roots are repeated. Under this assumption, 
let the roots be numbered in decreasing order of magnitude, so that a, here 
denotes the Jargest root in magnitude of (10.16.1). Then since (10.16.3) can be 
written in the form 


k k k 
μι = Cyot}1 t+ Ca (2 + Os (Gs ΡΞ Cn ( %n (10.16.4) 
Ci “4 ς, Xy Cy a1 


~if C, #0, it follows that, in any sequence generated by (10.16.2), the kth 
term is approximated by C,a{ as k > oo and, indeed, that the ratio 


r, = —* (10.16.5) 
μκ--ἰ 
tends to αι ask — oo if the largest root a, is real and unrepeated and if no other 
root has equal magnitude, unless μι, U2,.-., μη are so chosen that the coef- 
ficient C, of αὐ in (10.16.3) is zero. 
If the largest root «, is complex, and the coefficients of (10.16.1) are real, 
then «, is the complex conjugate of «, and is of equal magnitude. If we write 


αι = ᾧ, + ἴηι = B,e'* α, = ἃ, = Cy — ἴηι = Bie?" (10.16.6) 


where B, > 0 and ¢,, ,, B,, and @, are real, the terms corresponding to a, 
and «, in (10.16.3) can be expressed in the real form 


Bi(C, cos kd, + C, sin ΚΦ.) 


+ If a solution of (10.16.2) is assumed in the form yn, = αὖ, it is found that the charac- 
teristic equation determining admissible values of a is of the same form as (10.16.1). 
Thus «%, «&,..., αἴ are all solutions, and superposition leads to (10.16.3), which can 
be. shown to represent the most general solution, if no roots are repeated, when only 
integral values of k are considered. 
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if C, and C, are replaced by (C, — iC,)/2 and (C, + iC,)/2, respectively, 
in (10.16.3). 

Thus, if «, and ἃ; are not repeated and if all other roots are smaller in 
magnitude than f,, it follows that 


μι © Bi(C, cos kb, + C,sink¢@,) (k >) (10.16.7) 


But, if μὲ were given exactly by the right-hand member of (10.16.7), it would 
satisfy the recurrence relation 


μια — 2148, cos b, + Bij, = 0 (10.16.8) 


and conversely, as is easily verified. A second relation, involving the two real 
unknown quantities 8B, and ¢,, then would be obtained, by replacing k by 
k — 1, in the form | 

Hy, — 2py,-1B, cos, + Bity-2 =O (10.16.9) 


The result of eliminating cos ¢, from these two relations is 


(μζ... — μμμι..,)βῖ = μὲ -- μκειμκ-α (10.16.10) 


whereas the result of eliminating f7 is 


2(μζ... — μμικ..2)βι COS by = μμμμ..ι — μκειμι-., (10.16.11) 


Thus, if we introduce the definitions 


δε = μὲ -- μκειμᾳ- ἢ ἔκ = μμμα.- 1 — μκειμκ-, (10.16.12) 


these approximate relations give 


B= 2tnr ax 28, cos ¢, = 26, = —~ (10.16.13) 
Sp—4 Sk-4 
It follows that, unless it happens that C,; = C, = 0 in (10.16.3), because of a 
very special choice of μι, H2,..-., H,, the ratios s,/s,., and t,/s,-, will tend to 
By and 2B, cos ¢, as k > oo, from which limits the constants B, and φ,, 
or ¢, and n,, specifying the desired dominant complex root pair in (10.16.6), 
can be calculated. 

If «,; is a repeated real root, of multiplicity 2, so that a, = «,, and all 
other roots are of smaller magnitude, then the combination of terms cor- 
responding to «, and a, in (10.16.3) is of the form aj(c, + c,k). Since μὰ 
must then tend to such ἃ form as k — οὐ, it follows that μὲ must tend to satisfy 
the relation 

Mea — 2μχα, + με. αἴ = 0 (10.16.14) 


as αὶ - οὐ. Whereas an approximation to «,, which tends to a, as k > οὐ, 
could be obtained as the appropriate one of the two roots of this equation, the 
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solution of a quadratic equation can be avoided by rewriting (10.16.14) with k 
replaced by k — 1, and eliminating ai from the two relations, to give 


20, % —& (10.16.15) 
Sk-1 
with the notation of (10.16.12). 

Other exceptional cases, in which several roots have the same maximum 
absolute value, can be treated in a similar way. 

When the largest root «, is real and unrepeated and there are no other 
roots with the same absolute value, the ratio r, tends to a,, the rapidity of the 
convergence depending upon the magnitude of the ratio «,/a, of the two 
largest roots. If «, and «, are conjugate complex, (10.16.7) shows that r, 
will tend to oscillate about the value zero (although the period of the oscillation 
may comprise several iterations), whereas, if a. = a, [or a, ~ «,], the con- 
vergence of the ratio r, to a, will be slow; here the ratio ¢,/s,_, converges more 
rapidly to 2a, [or to a, + a]. Thus, after several iterations, the behavior of 
the sequence of r’s generally will indicate the true situation, and recourse can be 
had to the appropriate choice between (10.16.13) and (10.16.15) when that 
sequence is not acceptable. The more complicated situations seldom occur in 
practice. 

If «, is real and unrepeated, the ideal situation would be that in which 
μι, Hoy» + +> My Were so chosen that C, =--*: = C, = 0 in (10.16.3), so that 
Ui, μὴν. «τω MH, Would be respectively proportional to 4, az,..., αἴ. The 
first calculated value of r, 7,44 = Mn+1/Ha then clearly would be identical with 
a,. In such cases, the starting values could be taken efficiently as successive 
powers of a previously determined approximation to αι. If no information is 
easily available with regard to the nature of the largest root or roots, the starting 
values 


μι = Mg = Hh =O B= 1 
are often convenient. For this set of values it is easily seen that the undesirable 
case C, = Ο cannot occur. 
A particularly notable set of n starting values having the same property 
is that determined by use of the formula 
μ, = --(α;μ,--1 τ’ ἄγ}4,-- 2 ttt + Gy ae ra,) (r = I, 2, eg n) 
(10.16.16) 


with μοὸ = p-, =°*: = 0. For this set of starting values it can be shown} 


+ The proof follows directly from Newton’s power-sum identities (see Theorem 13 of 
Sec. 1.9, with 5, = 4,). 
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that all the C’s in (10.16.3) are unity, and hence that y, is then identified with 
the sum αὖ + a5 τ’ + αἱ for all k = 1. Thus, in particular, if |a,| > 
|x,|,..., |%,|, there then follow both a, κ᾿ μμίμι--ι and a, ~ μὴ" when k is 
sufficiently large. With the convention that a, = 0 when r > ἢ, it is seen that 
the recurrence formula (10.16.16) is still applicable when r > n since it then 
reduces to (10.16.2). This special procedure 1s closely related to the Graeffe 
procedure described in the following section. 

In the case of the example (10.10.6), with f(x) = x° — x — 1, the 
Bernoulli recurrence relation is merely μὲ = μα--2 + [y-3. If the iteration is 
begun with the starting values 1.30, 1.69, and 2.20, about sixteen iterations are 
needed to establish the real root 1.3247... to five significant figures, although 
here each iteration requires only a single addition. The remaining roots are 
complex, with an absolute value of about 0.9, so that the ratio of the magnitude 
of the dominant root to that of the subdominant root pair is about 1.5. The 
relative slowness of the convergence is due to the relative nearness of this 
ratio to unity. The fact that the subdominant roots are complex causes the 
sequence of iterates to tend to its limit in an oscillatory manner. 

In the case of the example (10.14.17), the recurrence relation is 


My, = 8p, —1 — 23μι-.} — 166-3 + SOM, —4 


and, with the arbitrarily chosen starting values 0; 0, 0, 1, the ensuing calculation 
is as follows: 


Hy rr δὰ bk Sk/Se—1 ἐκίδκ--1 
8 8 23 8 -- -- 
41 5.12 657 200 28.565 8.696 
128 3.12 16261 5224 24.750 7.951 
3 0.02 406537 130600 25.001 8.031 
— 3176 — 1059 10163401 3251272 25.000 7.997 
— 25475 8.02 


From the irregular behavior of the r sequence, it may be deduced that either 
the process has not yet begun to converge satisfactorily or there is a pair of 
dominant complex roots. To test the second hypothesis, the s and ὦ sequences 
are constructed, and the convergence of the sequences of ratios in the last two 
columns is evident. The true dominant roots are €, + in, = 4 + 33, so that 
Bi = ξξ + ni = 25 and 2ξ, = 8. The approximations afforded by the four 
successive pairs of ratios are 4.348 + 3.108%, 3.976 + 2.990], 4.016 + 2.979], 
and 3.998 + 3.002i. 

The Bernoulli iteration has the useful property that it yields the dominant 
root (or roots) regardless of the starting values except in the unlikely (and 
avoidable) case when C, = 0 in (10.16.3), in which case another root or root 
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pair will result; that is, it is not necessary to initiate the iteration with a suf- 
ficiently accurate approximation as is the case for many other iterative methods. 
This fact is of particular importance in those cases when only complex roots are 
present, since even rough approximations then are not readily obtained. The 
calculation is remarkably simple (and readily mechanized) when the dominant 
root is real and unequaled in absolute value, and is not unduly complicated 
otherwise. 

Methods for improving the convergence of Bernoulli iteration are sug- 
gested in Probs. 106 and 109. 


10.17 Graeffe’s Root-squaring Technique 
Graeffe’s iterative method for determining roots of the algebraic equation 
F(x) = x + ax) + ax? ἘΠ + 4,-14% +a,=0 (10.17.1) 


consists of forming a sequence of equations, such that the roots of each equation 
are the squares of the roots of the preceding equation in the sequence, for the 
purpose of ultimately obtaining an equation whose roots are so widely separated 
in magnitude that they can be read approximately from the equation, by 
inspection. 

The principle of the method can be illustrated by a consideration of the 
general equation of fourth degree, which can be written in the form 


f(x) = x* + a,x? + ayx* + a3x + ay 


(x — a,)\(x — a)(x — a3)(x — a4) = 0 (10.17.2) 


or, equivalently, 


T(x) = x* = (a, + Xo + X3 + α4)χ" 
+ (α,α; + αιᾶς + O40,  αγᾶ39 + αγᾶᾳ + α5α,)χ 
— (040503 + yO 2%, + 40304 + αἀγα34)Χ  αιαγαςαᾳ = Ὁ (10.17.3) 
where αι, &2, %3, and a, are the roots. 
If the roots are all real and are widely separated in magnitude, so that 


la,| > |a,| >> |a3| > |a,], the result of retaining only the dominant part of each 
coefficient in (10.17.3) is 


xt — ax?  αχα,χῆ — αιαῃας9χ + 0402030, % Ὁ (10.17.4) 


Thus the four roots are given approximately, in this case, by equating to zero 
the four linear expressions x + @,, @,X + Q@2, A,X + a3, and a3x + ay. 
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If, say, αι, and a, are conjugate complex, so that «, = B,e'*' and a, = 
B,e~**, and if also |a,| = || > |3| > |a,4|, the approximation replacing 
(10.17.4) is then 


x* — 2βι,χ' cos 6, + Bix? — βία:χ + βίαια, ~ Ὁ (10.17.5a) 


The complex roots are then approximated by the zeros of the quadratic x? + 
a,x + a,, and the remaining roots are found by equating a,x + a3; and 
a3xX + a, to zero. If, say, a, = a and |a,| = |a,| > |a3| > |o,|, the approx- 
imate relation is 


x* — 2a,x? + ajx? — ajasx + αἴα.α, 2 0 (10.17.58) 


and the approximate roots are obtained in the same way. Other, more unusual 
situations can be analyzed similarly. 
The root-squaring process itself can be based on the fact that the product 


(-)ὲ- 0) = αὐ — af)(x? — 02) +++ (x? — af) (10.17.6) 


is a polynomial of degree n in x*, whose zeros are the squares of the zeros of 
f(x). Thus, if f(x) = x" + ayx"~* Ὁ agx™™? -Ὁ “Ὁ. + .a,_,x +, is multi- 
plied, term by term, by 


(—1)"f(—x) = x" — ayxt + ayxt 2 — +++ 4 (—1)"~1a,_.x + (—1)"a,, 


and x? is then replaced by x, the result f,(x) is a polynomial of degree n with 
Zeros a7,...,%,. By repeating the process, a polynomial f,(x) with zeros 
at,..., af is obtained, then f,(x) with zeros a®, and so forth. 

If all roots are real, unrepeated, and of distinct magnitudes, the iteration 
may be concluded when the magnitude of each coefficient in an equation is the 
square of the magnitude of the corresponding coefficient in the preceding 
equation, within the tolerance adopted. Suppose that the original roots are 
@1,..., 4,, and that & root squarings are needed, so that the roots of the final 
equation are αἵ, ..., %,, where m = 2". If the final equation is of the form 


7) = χ' -- 4,χ 1 4,χ5 2. — 0.0... (-1)"1A,_ ax + (1A, = 0 
(10.17.7) 
there then follows 


A A A 
at = A, an my 2 ate 3 vee on" ἢ 
Α 
1 


(10.17.8) 
2 An-1 


Each of the right-hand members will be positive, and the proper sign must 
be chosen for the real mth root of each of these expressions by substitution of 
the two possibilities into the original equation or otherwise. 
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A dominant double original root «, would evidence itself by the fact that, 
after k root squarings, the equation would be approximately of the form 


f(X) xP — δαῖῦχ" 1 4 2M"? — (αἴα3)"χ" 2 + +++ = 0 (10.17.9) 


where again m = 2*, so that the magnitude of the coefficient of x"~! would 


tend to be half the square of the magnitude of the corresponding coefficient 
in the preceding equation. Similarly, if «, were a double real root, and if no 
other root were of equal magnitude, the coefficient of x”~” would have this 
property. Thus «, then would satisfy both of the relations 

Ary 1 am x A, 


—— => r ~ 


Ap r-1 


am nw 
r 


(10.17.10) 


and would be determined as the real root, with appropriate sign, of either 
equation. 

A dominant conjugate complex root pair αι) = B,e*'*' would cause the 
kth equation to be approximately of the form 


f(x) &% x" — 2βιηχητὶ cos mo, + BR™X"~? — (βία!) χη ἘΠ: = 0 
(10.17.11) 


where m = 2*, so that the coefficient of x"~* in the kth equation would tend to 
fluctuate in magnitude and sign in the same way as -- 2β1 cos m@,, as k and 
m = 2* increased, and hence again would not tend to be the square of the 
corresponding coefficient in the (kK — 1)th equation. The same sort of oscillation 
would occur in the coefficient of x"~" if «, and a,,, were a complex root pair, 
and, for k sufficiently large, B, and ¢, could be determined from the relations 


Ar+ 


2m ΣΤ = 2BF cos me, ἡ 


4, (10.17.12) 


r—1 r-1 


if no other root were also of magnitude B,. The magnitude β, thus would be the 
positive real (2m)th root of A,,,/A,-1, whereas the appropriate one of the 
values of φ, obtained from the second relation would have to be selected by 
trial and error or otherwise. 

When only one such pair of complex roots is present, say, 


Be"! = & + in, 


the selection of the appropriate value of φ, satisfying this relation can be avoided 
by noticing that, since the sum of all roots of (10.17.1) is given by —a,, there 
follows 


Xy + Xo ἜΤ α,.-.1 + 2c; + X42 tere t+ α, = —ay (10.17.13) 
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Hence é, is given immediately when the remaining n — 2 roots are known, 
after which ἡ, is given by VB? — €2. 
If two pairs of complex roots are present, say, 
Bett =é + in, and Bet? 


the corresponding relation is 


¢, + ing 


2(é, + €,) = —(a, ea ΒΕ, ΞΕ ΞΕ Ogee ΞΕ 
+ 5-4 + O49 ἘΠ᾽ +) (10.17.14) 


A second linear relation between ἔξ, and €, is then obtained by recalling that the 
sum of the reciprocals of the roots is —a,,_ ,/a,, so that 


1 
Oy C4 a; Cr — im 
ει π-- ΟΝ peed Bee, <a 
“a = IN, Gy — IMs an ay 


or, after rationalizing the reciprocals of the complex numbers and transposing 


terms, 
(ἐς + it) = se eee 4 (10.17.15) 
α 


β 5 ay, Oy n 


where the reciprocals of the four complex roots are to be omitted in the right- 
hand member. Since the magnitudes β, and 8, are known, the relations (10.17.14) 
and (10.17.15) comprise two linear equations for the determination of &, and 
ξ, if Br # B2, after which ἡ, = Jf? — & and n, = Vp? — 2. 

In other situations where two or more of the original roots are of equal 
(or nearly equal) magnitude, it may be possible to exploit available information 
(for example, with regard to reality) for the purpose of determining which of the 
admissible @’s are appropriate. Otherwise, one may apply the Graeffe method 
both to f(x) = 0 and also to the equation f(x + ε) = 0, with a fixed value of «, 
and select the appropriate candidates as points of intersection of circles of radii 
|w,| and |a, + εἰ in the complex plane (see Prob. 116). A procedure using 
information corresponding to the limiting situation where e > 0 was proposed 
by Brodetsky and Smeal [1924] and systematized by Lehmer [1945, 1963]. 

In place of actually multiplying together the polynomials f(x) and 
(— 1)"f(—x) to obtain the function /,(x), it is desirable to work with detached 
coefficients and to obtain formulas recursively relating the new coefficients to 
the original ones. For this purpose, it is convenient to write 


F(x) = Aox" — Ayx"=! 4 4,χ5 2 — 0 = δ (-1)'A x"! (10.17.16) 
i=0 


606 INTRODUCTION TO NUMERICAL ANALYSIS 


with the convention that A; = 0 when i > n. If we use this convention, there 
follows 


(-DYW(-») = Σ (-1iAar! Σ Ayr) 


(-- 1)'A;A  τ 


iMs 


«a 


. 1 
tMs {Ms 
Sona 


Snes 


Since clearly only the even powers of x will remain, we then may write 
i + j = 2k and, after changing the limits appropriately, we have 


oO 


(—1"F(x)f(—x) = Ale’) = Σ (ἡ κα" 


k=0 
where 
2k 
Ax ὩΣ >. (—1)'**A,Aa,-; 
i=0 
Thus f,(x) is given by 
f(x) = > (χη " (10.17.17) 
k=0 
where 


Aj = A? — 24,. .Α,...1 + 24... 2Ax42 = 2A, 3Ap43 ce (10.17.18) 


and where the series of products terminates when either the first subscript 
reduces to zero or the second increases to ἡ. This formula is convenient because 
of the fact that the coefficients A,_, and A,+, involved in each product are 
symmetrically placed about A,. 

The procedure may be illustrated by the simple case of the cubic f(x) = 
x? — x — 1 considered in (10.10.6), for which Ay = 1, 4, = 0, 4; = --ἰ, 
and A, = +1, in accordance with (10.17.16). By making use of (10.17.18), 
the coefficients of the successive equations, again written in the form x* — 
A,x* + A,x — A, = 0, are obtained as follows in the first six iterations: 


Ao A, A» A3 
f 1 0 -Ἰ 1 
ἢ: 1 2 1 1 
ta 1 2 —3 1 
fs 1 10 5 1 
fie 1 90 5 1 
2 1 809. --155. 1 
4 1 65448410 7845 [1 


The coefficients A, and A; here remain fixed, whereas the coefficient A, in f¢4 is 
the square of that in f;, to five significant figures. The persistent fluctuation of 
A, indicates that the roots «, and a are conjugate complex. 
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Thus the sequence of approximations to «, is 0, re 2, Ν᾽ Ἢ v 10, ἐν, 90, 


ἽΝ 8090, “ν 65448410,...., or 0, 1.4, 1.2, 1.33, 1.3248, 1.3247,.... The fact that 
the positive sign is correct would be determined most easily by noticing that 
f(x) changes sign between x = 1 and x = 2. Reference to the first equation 
of (10.17.12), taking into account the fact that here 4, = 1, shows that the 
corresponding approximations to the magnitude 8, of the complex root pair are 
the reciprocal square roots of the approximations to «,, so that the best available 
approximation is B, ~ 0.86884 to five places. Rather than use the second 
relation of (10.17.12), which would involve choosing the appropriate value of 
cos @, for which cos 64¢, = 0.68263 from among 64 possibilities, we use 
(10.17.13) to obtain 2€, = —a,, and hence €, ~ —0.6624. Finally, there 
follows ny, = J Bz — €2 = 0.5622, so that the approximate roots are 1.3247 
and —0.6624 + 0.5622i. 

In order to illustrate the calculation involved in less simple cases, we display 
the results of five iterations as applied to the equation 


x* — 10x> + 35x? — 50x + 24=0 


when only three digits are retained: 


Ao Α, Az 43 A, 
f 1.00 1.00(1) 3.50(1) 5.00(1) 2.40(1) 
hr 1.00 3.00(1) 2.73(2) 8.20(2) 5.76(2) 
Sa 1.00 3.54(2) 2.65(4) 3.58(5) 3.32(5) 
fs 1.00 7.23(4) 4.49(8) 1.11(11) 1.10(11) 
Sie 1.00 4.33(9) 1.86(17) 1.22(22) 1.21(22) 
S32 1.00 1.84(19) 3.45(34) 1.49(44) 1.46(44) 


Here an integer in parentheses following a number represents the power of 10 
by which that number is to be multiplied to give the relevant coefficient. The 
entries are obtained simply by use of (10.17.18). For example, the coefficients 
in f;6 may be calculated as follows: 


A, = 108[(7.23)? — 2(1.00)(4.49)] 

4, = 10'°[(4.49)? — 2(7.23)(1.11)(10~*) + 2(1.00)(1.10)(107 5] 
A = 10??[(1.11)? — 2(4.49)(1.10)(1073)] 

4, = 10?2(1.10)? 


In a sixth iteration, the squared term in (10.17.18) obviously would not be 
modified to three digits by the product terms in any case, so that the iteration is 
terminated. Here all roots are clearly real, and the application of (10.17.8) 
to f32 yields the approximations a, ~ 4.000, a, ~ 3.001, ας ~ 2.000, and 
a, * 0.999. The correctness of the positive signs is assured here by the fact that 
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the expression for f(— x) involves only positive coefficients, so that no negative 
real roots can be present. 

It is of some interest to notice that the use of (10.17.8) at earlier stages 
of the iteration would yield the following sequences of approximate roots: 


1 2 3 4 
f 10.000 3.500 1.429 0.480 
iP 5.477 3.017 1.733 0.838 
ta 4.338 2.941 1.917 0.981 
fs 4.049 2.979 1.991 0.999 
his 4.002 3.000 2.000 1.000 
52 4.000 3.001 2.000 0.999 


The Graeffe method possesses the theoretical advantages that the iteration 
leads to all zeros of f(x) at the same time, and that (as in the Bernoulli iteration) 
there is no question of the existence of ultimate convergence if appropriate 
attention is paid to the control of roundoff errors. As the preceding example 
illustrates, this control normally does not present difficulties in the Graeffe 
iteration. However, the process itself is often rather laborious for desk calcula- 
tion, and the extraction of algebraic roots of high order, which is involved in 
the process, is conveniently effected in machine calculation only by an iterative 
process (see Prob. 47). The possibility of rapid growth of the relevant coef- 
ficients in a prolonged sequence of root squarings leads also to the danger of 
overflow when use is made of a computer, unless appropriate precautions are 
taken. 

A serious disadvantage follows from the fact that a gross error committed 
at any stage of the calculation invalidates all subsequent calculations, whereas 
the other iterative methods considered here would suffer only a temporary 
reduction in the rate of convergence. 

Rather than use this method for the complete determination of the roots, 
it is often convenient merely to iterate sufficiently to obtain crude approx- 
imations, when such approximations are not easily obtained by other methods, 
and then to improve these approximations by simpler or more rapidly con- 
vergent methods. 

The root-squaring process is also useful in connection with the Bernoulli 
iteration, in cases when that iteration appears to converge slowly, since the rate 
of convergence increases with increasing values of the ratio of the magnitudes 
of the dominant and subdominant roots. Thus the convergence will be improved 
if the original equation is replaced by one whose roots are, say, the squares or 
fourth powers of the original roots. 
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10.18 Quadratic Factors. Lin’s Quadratic Method 


Among the most troublesome algebraic equations, in practice, are those which 
possess two or more pairs of nonreal roots. Whereas the methods of Secs. 
10.16 and 10.17 can be used in such cases, and will always generate convergent 
sequences of approximations, the convergence is often slow and the time or labor 
involved may be excessive. We next treat two methods which are similar to those 
considered in Sec. 10.14, but in which successive approximations to a quadratic 
factor are generated. Both methods have the property that the iteration may not 
converge unless the initial approximation is sufficiently good and, in fact, one 
of them may require a modification to yield a convergent sequence even in that 
case. Thus, in troublesome cases, the use of the Bernoulli or Graeffe iteration 
may be desirable in order to afford a reasonably good initial estimate. 
If the polynomial 


70) = χ' + ax”? +-+++a,1x +a, (10.18.1) 
is divided by the quadratic expression x* + px + 4, so that 
7) = xX tax +++ + a,x +4, 
= (x7 + px + 4)" 2 + διχη +--+ +b 3x + B,_5) 
+ Rx + S (10.18.2) 
the requirement that this expression be a factor of f(x) imposes the two 


conditions 
R=0 S=0 (10.18.3) 


where R and S are the coefficients of the linear remainder and are certain 
functions of the parameters p and 4. 

In order to obtain a recursive method for calculating R and S without 
actually effecting the long division, we equate coefficients of like powers of x 
in the two members of (10.18.2) and thus obtain the relations 


a,=6,+ p ag = δ, + pb, +q a, = ὅς + pb, + qb, 
a, = b, + pb,_, + φῇ... 
k k τ POx,-1 40...) (10.18.2 
Ω͂,,.-- 2 τ᾿ by—2 + pb,-3 + qb, 4 Qn-1 — R Τ' Ρ},-.2 + gb,,-3 
an τ δ Bs qgb,,_ 2 
Thus, if we introduce the recurrence formula 
ὃ. = aq — pb,_, — gb,_» (kK = 1,2,...,n) (10.18.5) 
with 
b_, — 0 bo — 1 (10.18.6) 
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it follows that this formula will generate the coefficients of the quotient in 
(10.18.2) with k = 1, 2,..., — 2, and also that 


R = δ... = α,,.-1 — Pon-2 — Gb,-3 (10.18.7) 

S = δ, + pb,-1 = a, — Wn-2 (10.18.8) 

Hence the expression x* + px + q will factor f(x) if and only if the conditions 
R = a,-1 — Po,-2 — 4,-.5 = 9 S =a, — qb,-2 =9 (10.18.9) 


are satisfied. 
Lin’s quadratic iteration consists of applying the method of successive 
substitutions to the result of rewriting (10.18.9) in the form 
An—1 -- 40... 3 a, 


= b,-2 : " 


so that “improved” values of p and 4 are defined by the formulas 


pt = Ser 40... gg * = (40.18.10) 


b,-2 δ...) 
and hence 
τς — pb,-2 — 4Ὁ,.-. n — 4Ὁ.-. 
pt -- Ρ . &n 1 ΡΌ,.--2 4 3 q* —q - & q 2 
b,—2 b,-2 
or, equivalently, by virtue of (10.18.7) and (10.18.8), 
pa7a αἴ =q + 5 (10.18.11) 

b,-2 b,~2 


In analogy to (10.14.14), it is known that, if p and q are to be such that 
the zeros of x? + px + g approximate the true zeros «, and «, of f(x), then 
the two relevant asymptotic convergence factors are (see Prob. 120) 


ΓΕ Ὁ  αιὰ),) ἔα) δ Ξ ae . αιὰ) 1 (α)) (10.18.12) 


Ao — & aA, Ao — ἅ, aA, 


That is, if either or both of these factors exceeds unity in absolute value, then 
one or both of the zeros of the modified expression x” + p*x + q* generally 
will afford poorer approximations to a, and a, than the zeros of the expression 
x? + px +q. Thus, if x? + px + q is to converge to (x — αι)(χ — a), 
it is generally necessary that 

lpil < 1 \p.| < 1 (10.18.13) 


In addition, it is necessary that the initial estimates of p and ᾳ not differ exces- 
sively from —(a, + 2) and aa, respectively. The result (10.18.12) is useful 
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only if fair approximations to a pair of roots can be obtained in advance. In 
analogy to (10.14.16), the conditions (10.18.13) can also be expressed in the form 


-ae-3-(-3) 


In the absence of preliminary information, the iteration may be started 
with arbitrarily chosen values of p and 4, in the hope that convergence to some 
root pair (real or complex) will ensue. With the convenient initial choice p = 0, 
q = 0, the first iteration always yields the quadratic 


<1 (k= 1,2) (10.18.14) 


ee aati gee eared 
an-2 Ω,-- 2 
whose zeros will approximate the two smallest roots of f(x) = 0 if those roots 
are sufficiently small relative to the others. It is seen that the initial choice 
P = 4, q = 4», corresponding to the quadratic x* + a,x + a , whose zeros 
would approximate the two Jargest roots of f(x) = 0 if those roots were 
sufficiently separated in magnitude from the others, leads always to b,_, = 0 
in the first iteration when n = 3 orn = 4, so that the following iteration then is 
undefined. This fact suggests that convergence of the Lin quadratic iteration to 
the /argest root pair generally cannot be obtained when that pair is widely 
separated in magnitude from the others, as can be seen also by noticing that 
(10.18.14) then will tend to be violated. 

In the more general case, however, (10.18.14) shows that the possibility 
of convergence to the largest root pair, or to any other chosen root pair, depends 
in a fairly complicated way upon the configuration of all the roots. 

For desk calculation, the data can be arranged in parallel columns, as 
follows: 


1 1 

ay b, 
δι ἃ ὃ,.--2 
Ω.-- 1 R 
Gn S 


Here each entry in the ὁ column except the last is obtained by subtracting from 
its left-hand neighbor p times its first upward neighbor and q times its second 
upward neighbor. (In calculating by) and b,, the missing entries are taken to be 
zero.) The last element (S) is calculated in the same way except that its first 
upward neighbor is imagined to be replaced by zero. Finally, there follows 


and Aq Ξ 4 -—-q= 
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In illustration, the quartic equation 
f(x) = x* — 8x? + 39x? — 62x + 50 = 0 (10.18.15) 


possesses the complex roots 1 + i and 3 + 4i, and f(x) is factorable in the 
form f(x) = (x? — 2x + 2)(x? — 6x + 25). The first steps in a Lin iteration, 
assuming ignorance of this information, and starting with p = q = 0, may be 
tabulated as follows: 


p= 0; —1.6 — 1.95 —2.009 | —2.008 | —2.003 — 2.0007 
q= 0 1.3 1.82 1.970 2.001 2.003 2.0012 
1 1 1 1 1 1 1 
—8 —8 — 6.4 — 6.05 —5.991 | —5.992 | —5.997 
39 39 27.5 25.38 24.994 24.967 24.985 
— 62 2262-4 Oo7 —1.50 0.015 0.124 0.057 
50 50 14.2 3.81 0.762 0.041 | —0.045 
Ap = | -—1.6 |) —0.35 | —0.059 0.001 0.005 0.0023 
Aq = 1.3 0.52 0.150 0.031 0.002 | —0.0018 


Thus, at this stage, the approximate factorization is 
St (x) & (x? — 2.001x + 2.001)(x? — 5.997x + 24.985) 


The Lin iteration technique is perhaps the simplest known method for 
the numerical solution of algebraic equations, when two or more pairs of complex 
roots are present. However, it possesses the disadvantage that convergence 
is not certain, even though the starting values are good approximations to true 
values, and that the rate of convergence, when present, is often rather slow. The 
following section describes a somewhat more elaborate method which usually 
has better convergence properties, as well as a less elaborate modification of the 
present method. 

Use of (10.18.11) shows that the relation (10.18.2) is equivalent to the 
relation 


f(x) = (x? + px + 4)" 2 + διχῆ ἡ τ τ + δ... 3Χ) 
+ b,-(x* + p*x + q*) (10.18.16) 


Thus it follows that if f(x) is divided by the trial factor x7 + px + 4, and if 
the steps in the division are terminated when the remainder is quadratic (rather 
than linear), the new Lin trial factor x? + p*x + q* can be obtained by dividing 
that remainder by its leading coefficient. For this reason, Aitken [1952] refers 
to the new Lin trial factor as the reduced penultimate remainder and to Lin’s 
quadratic method as the RPR method. 

It may be noted that Lin also suggested an alternative technique in which 
the new value g* is calculated first from the second relation of (10.18.10), 
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after which qg* is used in place of q in the first relation for the calculation of an 
“improved” value of p*. In some cases this alternative affords improved con- 
vergence; in others (including the preceding example) the reverse is true. 


10.19 Bairstow Iteration 


Another iterative method for solving algebraic equations, apparently first 
devised by Bairstow, but rediscovered by Hitchcock and others, differs from the 
Lin method in that the equations 


R(p,q) = 9 = S(p, gq) ΞΟ (40.19.1) 


are solved by Newton-Raphson iteration, rather than by the method of suc- 
cessive substitutions used by Lin, so that it is a second-order process. 
By virtue of the relations (10.18.7) and (10.18.8), we have 


R=b,-1 S=b, + pb,-; (10.19.2) 


and hence the Newton-Raphson recurrence relations (10.13.8) become 


Prt Ap + Pet Ag + b,. = 0 
Op 0q 
and 
& 6p ea bs) Ap 
Op Op 


+ (De 4 pM=1) Ag + b, + phy, = 0 
θη oq 


where Ap = p* — p and Ag = q* — q. If the second relation is simplified, 
by subtracting from it p times the first equation, the two relations become 


Pot Ap + Pet Ag + b,_, = 0 
Op θη 
κῃ Ἢ (10.19.3) 
— + b,_,) Ap + — Aq + b, = 0 
dp oq 


If we recall that the b’s are defined in terms of the coefficients of f(x) by 


the recurrence formula (10.18.5), 
δ, = a, — pb,_, — qb,_ ΞΕ  Σ ν Ν 
k k — POR-1 40... ( ) (10.19.4) 
b_, = 0 bo = 1 


it remains only to determine the partial derivatives involved in (10.19.3). 
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For this purpose, we obtain from the relation (10.19.4) the additional relations 


Lay a cs aa oe a δι τῇ 
Op Op Op 
(10.19.5) 
ὃ ἃ = 0 dbo -- 0 
Op Ρ 
and 
Ld Se a Ld Yd ¢ πὴ 
0q 0 oq 
(10.19.6) 
ob- . — 0 dbo - 
θη θ4ᾳ 
Hence, if we introduce a new recurrence formula 
Cc, = ᾧ, — Ραρ..1 — QCy- R= 1,2....,η =] 1 
k k —~ P&—-1 — WCx-2 ( ) (10.19.7) 


c., Ξ 0 Co = 1 


it follows, from (10.19.5), that 


OP ne (κ-|1,2,...,5) (10.19.8) 
Op 
and, from (10.19.6), that 
=: =-c,, (k=1,2,...,n) (10.19.9) 
q 


where the c’s are obtained from the b’s just as the b’s are obtained from the a’s. 
Thus the first n — 4 of the c’s are the coefficients in the relation 


x2 be By F tes + Bax + d,-2 
= (x? + px 4)" 5 + yx? ἘΠ᾽: + Cg 5X + Cy—a) 
+ R’x + S’ (10.19.10) 
and also 
ROS 64 S’ = Cy-2 + PC,-3 (10.19.11) 


In particular, we have 


LE ae = = —c,, (10.19.12) 
q 


SS Εν 
dp = aq 


so that three of the four desired coefficients in (10.19.3) are now identified, 
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and are calculable from (10.19.7). When k = n, Eq. (10.19.8) gives 


oe = ~c,_, (10.19.13) 


and hence the remaining coefficient in (10.19.3) is given by 
20, + δ,-., = —c,_, (10.19.14) 
Op 


where, in accordance with (10.19.7), 
Cn-1 = Cy-1 — ὃ,.--ι = —PCp—-2 — 40,--3 (10.19.15) 
The basic equations of the Bairstow iteration then take the simple form 
C,-2 Ap + c,-3 Aq = b,_ 
en "τ 1 (10.19.16) 
Cn-1 Ap + Cn—2 Aq Ἐπ b, 


and the principal calculation involved in an iteration can be arranged as follows: 


1 1 1 

a, δι Ci 
Qn-4 bn-4 Cn—4 
Qn-3 bn-3  Cn—3 
Qn—2 β,.-- 2 Cnu—2 
Gn-1 δ,.-ἰ Cn-1 
ay bn 


Here each element in the b column (including b,), and each element of the c 
column except the last one (c,_,), is calculated as in the Lin iteration, as the 
result of subtracting from the element to its left p times the last calculated 
element above it and qg times the next-to-last element above it. The element 
C,-1 18 calculated in the same way except that the element to its /eft is imagined 
to be replaced by zero. 

In addition, it 15 necessary to solve the simultaneous linear equations 
(10.19.16) for the corrections to be added to p and gq to give p* and q*. For this 
purpose, the quantities 

|) a ee NY (10.19.17) 
and 
p= Pn-1tn—2 — Ontn—s (10.19.18) 
DG Cy τὰ b,Cn—2 


Ὁ 
I 


may be recorded, after which there follows 


es ae (10.19.19) 


D 
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The first three stages of the result of applying the Bairstow iteration to the 
equation (10.18.15), again starting with p = 4 = 0, appear as follows: 


P,Q = 0,0 — 1.3, 1.3 — 1.9, 1.9 — 1.998, 1.998 
1 1 1 1 1 1 1 
—8 —8 —8 —6.7 | —5.4] —6.10 — 4.20 
39 39 39 29.0 20.7 | 25.51 15.63 
—62 | —62 0 — 15.6 33.9 | —1.941 37.68 
50 50 ) — 8.0 — 2.157 
D= 1521 612 403 
D,, Dg = — 2018, 1950 — 366, 363 — 39.4, 39.4 
Ap, Aq = — 1,3, 1.3 — 0.6, 0.6 — 0.098, 0.098 


The next (fourth) iteration gives p ~ —1.9999992 and q ~ 1.9999992 if 
sufficiently many digits are retained in the calculation. A comparison of these 
results with those obtained in the preceding section illustrates the fact that 
whereas the Bairstow iteration may converge more slowly than the Lin iteration 
in the early stages, when both iterations converge, its ultimate rate of convergence 
in such cases is far superior. This is due to the fact that it is a second-order 
process, whereas the Lin iteration is a first-order process. 

Furthermore, the Bairstow iteration will converge if the starting values of 
p and q are sufficiently close to true values, whereas in the Lin iteration this is 
not always the case. On the other hand, the Bairstow iteration appears to be 
somewhat more sensitive to the choice of starting values than the Lin iteration, 
in the sense that, if the Lin iteration is asymptotically stable at (α,, a), it may 
converge with starting values which correspond to cruder approximations to 
(αι. “,) than are required for convergence of the Bairstow iteration. 

Various modifications of both the Lin and Bairstow procedures are possible 
(see Prob. 125). 

In particular, a modification of the Lin quadratic method which is 
analogous to that defined by (10.14.18) and (10.14.20) can be obtained by first 
applying the Newton-Raphson formulas (10.13.8) to the functions 


R qR S qs 


(10.19.20) 
b,-2 ay τ΄ δ δ... a, — 


(p,q) = 


with the partial derivatives of ¢ and w relative to p and q evaluated in the limiting 
(ideal) situation when R = S = 0. (Compare Prob. 91.) For example, there 


follows 


i ———————_—_—_—_———2&2V_ —_—owe ee 
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and, similarly, 


op a> Sad qC,-3 

Og 4, 

0 

ow κω αὐ σα (ee — bai + PCy-2) 
Op ay 

ow 


a (Cyh-2 + PCn-3) 

oq 

Here p and 4, as well as the c’s which depend upon them, are imagined to 

have their ideal values, such that x? + px + q is indeed a factor of f(x). If, 

instead, approximate values p and g are used, and if the corresponding values of 

the c’s are also indicated by bars, the associated simulated Newton-Raphson 
formulas become 


n 


y= Ap a5 C123 Aq τΞ τ . 
q b,-2 
δι: Np de nha ee Se = 191051) 
4 b,-2 
after a slight rearrangement. 
The solution of these equations can be written in the form 
Ap = AR + A128 Aq = λἀξι + 4228 (10.19.22) 
b,-> b,-2 
where 
As = On Cn—2 a PC, —3 ΡΞ = eat, Cn-3 
q D q D 
Ci a ee 7 (10.19.23) 
4, Cy— PC,—2 An Cy—2 
ics’ tes τυ ee Dk. ae 8 
21 Ζ D 22 Ζ D 
and where 


D = @_, — G_,é,-3 (10.19.24) 


Here the c’s and the /’s are evaluated once only, in correspondence with the 
approximations p and g to the desired parameters, after which only one quadratic 
synthetic division is required per iteration, the calculation differing from that in 
the Lin method only in that Ap = p* — p and Ag = g* — q are determined 
by (10.19.22) rather than by (10.18.11). In particular, here again the last entry 
in the ὃ column is S (rather than ὃ,» as in the Bairstow process). 

If p and g are fair approximations to true values, this modified Lin 
process simulates a second-order process for a limited number of steps. As in 
the case of the modified linear-factor method, ultimately the superiority of a 
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truly second-order process (such as Bairstow’s method) would be evident if a 
sufficiently large number of iterations were required. 

A somewhat simpler process consists of using (10.19.16) with the c’s 
replaced by their initial values throughout the iteration. 

To illustrate the modified Lin quadratic process, we again consider 
(10.18.15) and suppose that the approximate values p = —1.9 and 4 = 1.9 
have been obtained in advance. From the two-column array 


p= —1.9 
q= 1.9 
1 1 1 
'- ὃ —6.10 — 4,20 
39 25.51 15.63 
— 62 — 1.941 37.68 
50 1.531 
which differs from the corresponding array in the Bairstow tabulation only in 
that the entry S = 1.531 replaces the entry b, = —2.157, we obtain the rounded 
data 


σι, = —4.20 Cc, = 15.63 Cz, = 37.68 D = 403 
Hence, by use of (10.19.23), the modified iteration formulas follow in the form 


_ 1.542R + 0.2748 hae —0.521R + 1.021S 
b,—»2 bn-2 


and the tabulation of the first two iterations is obtained as follows: 


Ap 


= --Ἠ1.9 — 2.00088 — 1.999994 
= 1.9 2.00092 1.999992 
1 1 1 1 
—8 — 6.10 — 4,20 — 5.99912 
39 25.51 15.63 24.99556 
— 62 — 1.941 37.68 0.016875 
50 1.531 — 0.014116 
Ap = — 0.10088 0.000886 
Aq = 0.10092 — 0.000928 


In this case the simulated (stationary) Bairstow process, using (10.19.16) with 
the same fixed values of the c’s, is somewhat less effective. 
10.20 Supplementary References 


For treatments of the great variety of available methods for solving sets of linear 
equations, see Wilkinson [1960, 1961, 1964, 1965] on direct methods and 
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associated error analysis, and Varga [1962], Householder [1964], and Young 
[1971] oniterative methods. Forsythe [1953] still provides a useful commentary. 

The method of Crout [1941] was originally devised for use on desk 
calculators and sometimes is advocated only for that purpose in the current 
literature, in spite of the frequent use of this and related compact elimination 
methods in computer calculation when the processes of pivoting and equilibra- 
tion are suitably incorporated. See, for example, the relevant Algol procedures 
of Forsythe [1960] and McKeeman [1962] and the treatments in Forsythe 
and Moler [1967] and in Wilkinson [1967]. The material in Sec. 10.8 was taken 
from Hildebrand [1965]. Doolittle’s method, which is somewhat similar to 
that of Crout, is described in “Modern Computing Methods” [1961] and 
Ralston [1965]. 

Matrix eigenvalue (characteristic-value) problems, which are not treated 
here, are considered in detail in Householder [1964] and Wilkinson [1965], 
both of which include useful bibliographies. 

The notion of the order of an iterative process was introduced by Schréder 
[1870]. Later contributions include those of Hamilton [1946], Bodewig [1949], 
Ehrmann [1959], and Traub [1964]. 

General treatments of numerical methods for solving nonlinear equations 
include Traub [1964], which presents an exhaustive study, classification, and 
compilation of iterative methods for determining a single zero, and Householder 
[1970], which deals with general methods intended principally for algebraic 
(polynomial) equations. 

Traub [1964] also considers measures of relative efficiency of iterative 
processes, which attempt to predict the ratio of the labor, time, or “‘cost’’ totals 
associated with determinations effected by two different iterative processes, in 
terms of the orders and relative complexities of those processes. Other references 
are cited. 

The accelerative A? process, which apparently is over a century old, was 
popularized by Aitken [1926] and has been generalized by Shanks [1955] and 
Wynn [1956] (see Sec. 5.12) and by others. See also Householder [1970]. 

Muller’s method (Muller [1956]) is formulated in terms of divided 
differences by Traub [1964], who also points out that other iterative processes 
can be conveniently recast or modified by use of the notation of divided dif- 
ferences and of their recursive properties. 

The iterative solution of sets of nonlinear equations is treated by Ortega 
and Rheinboldt [1970]. For specific numerical methods, see Booth [1949], 
Wolfe [1959], Kincaid [1961], Ostrowski [1966], and Rabinowitz [1970], 
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the last of which also includes methods for single equations as well as a biblio- 
graphy of methods for the solution of nonlinear sets. 

Useful methods of obtaining suitable “starting values’’ for the iterative 
solution of nonlinear algebraic equations are given by Derwidué [1957] and 
Durand [1960]. Householder [1970] includes a chapter on the location of zeros 
of polynomials and lists many references, including Marden [1966]. 

Some interesting relationships between standard methods of solving 
algebraic equations and special properties of matrix eigenvalue problems are 
pointed out and exploited by Wilf [1960]. 

The method of false position is studied by Ostrowski [1966], and the 
secant method by Ostrowski [1966] and Barnes [1965]. Aitken [1926], in 
his analysis of Bernoulli’s method, includes his first advocation of the A? 
process. 

Graeffe’s method is analyzed by Ostrowski [1940], Bodewig [1946], 
and Hoel and Wall [1947], and is modified and systematized for computer use 
by Bareiss [1960, 1967]. The modifications of Brodetsky and Smeal [1924] 
and of Lehmer [ 1945, 1963] are included in Householder [1970]. 

Lin’s methods are studied by Aitken [1951, 1952] and are generalized 
to the extraction of higher-degree factors by Luke and Ufford [1951]. 

Among the many methods for solving algebraic equations which are not 
considered in this text are the following: Laguerre’s method, treated in Der- 
widué [1957], Durand [1960, 1961], Ostrowski [1966], and Householder 
[1970]; the always-convergent method of Lehmer and Schur, presented in 
Lehmer [1961]; and the QD (quotient-difference) method of Rutishauser 
[1956] and Henrici [1958, 1963, 1964], which also has many other applications. 
In addition, Traub [1966] gives a procedure which constructs a sequence of 
iterative processes, in correspondence with any given algebraic equation and 
any given initial starting value, which terminates with a process yielding con- 
vergence to a root of the equation. This and related “global”? procedures also 
are considered in Householder [1970]. 

The evaluation of zeros of ill-conditioned polynomials is studied in 
Wilkinson [1959] and Rice [1965], and the effects of roundoff errors in the more 
general case are dealt with in Wilkinson [1964]. 

Methods similar to those of Bairstow [1914] and Hitchcock [1944] for 
the iterative extraction of quadratic factors of polynomials are given by Friedman 
[1949] and McAuley [1962]. Generalizations to a second-order process for 
quartic factors and to a third-order process for quadratic factors were effected 
by Salzer [1961] and are being repeatedly rediscovered. 
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PROBLEMS 
Section 10.2 


1 Solve the following set of equations by use of determinants, without introducing 


roundoffs: 
1.4x, + 2.3x, + 3.7x3 = 6.5 
3.3x, + 1.6x2 + 4.3x3 = 10.3 
2.5Χ1 + 1.9x, + 4.1x3 = 8.8 


2 Determine D times the inverse of the coefficient matrix in Prob. 1 without 
introducing roundoffs, where D = -- Ο.249 is the determinant of that matrix. 
Then use this matrix to obtain explicit expressions for Dx,, Dx,, and Dx3 when 
the respective right-hand members are replaced by c,, c,, and c3, and check the 
results when the c’s are assigned the values given. Also use this result to investigate 
the significance of the solution if it is supposed that the given coefficients are 
exact, but that the given right-hand members are only rounded numbers. 

3 Show that the equations 


OX, + 3x. + x3 = 5 
2X1 — Xo + 2ωχ = 3 
xX, + 4x, + ox; = 6 


possess a unique solution when ὦ #4 +1, that no solution exists when ὦ = —1, 
and that infinitely many solutions exist when ὦ = 1. Also, investigate the 
corresponding situation when the right-hand members are replaced by zeros. 


Section 10.3 


4 By considering the result of increasing each x by unity in each equation of 
(10.2.1), establish the validity of the following gross-error check: 
If to each equation is adjoined an entry representing the sum of the coefficients and 
the right-hand member of that equation, and if the column of those entries is 
transformed under the Gauss (or Gauss-Jordan) reduction in the same way as the 
column of right-hand members, then, at each succeeding step, the transformed 
entry associated with any transformed equation will equal the sum of the coefficients 
and the right-hand member of that equation, except for the effects of intermediate 
roundoffs or gross errors. 

5 Solve the set of equations in Prob. 1 by the Gauss reduction, retaining only five 
decimal places in the intermediate calculation and using the gross-error check 
of Prob. 4. 
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6 Proceed as in Prob. 5 with the following set of equations: 


8.467x, + 5.137x, + 3.141x3 + 2.063x, = 29.912 


5.137x, + 6.421x, + 2.617x3 + 2.003X4 = 25.058 
3.141x, + 2.617x, + 4.128x3 + 1.628x, = 16.557 
2.063x, + 2.003x, + 1.628x3 + 3.446x, = 12.690 


7 Repeat the calculation of Prob. 6, using the Gauss-Jordan reduction. 


Section 10.4 


8 Verify (10.4.9) numerically in the case of the coefficient matrix in (10.4.10). 
9,10 Proceed as in Probs. 5 and 6 using the Crout reduction. 


Section 10.5 


11,12 Assuming the given data to be exact, and starting with the approximate 
solutions of Probs. 9 and 10, obtain the solutions of those problems with 10-place 
accuracy by use of the Crout reduction. 

13 Solve the equations [compare (10.5.1) ] 


10.01x, + 6.99x, + 8.01x3 + 6.99x, = 32 
6.99x, + 5.01x2 + 5.99x3, + 5.01χ, = 23 
8.01x, + 5.99x, + 10.01x3 + 8.99x, = 33 
6.99x, + 5.01x, + 8.99x3 + 10.01x, = 31 


approximately, by the Crout method, rounding all entries in the auxiliary 
matrix to only three decimal places; compare the results with the true solution 
X1 = Xo = X3 = χα = 1. Then calculate the residuals and verify the effective- 
ness of the iterative process of Sec. 10.5 in this case, obtaining the solution to 
three places. 

14 Obtain a five-place solution of the set 


4.18x, + 2.87x, + 3.03x3 + 2.11x, = 27.45 
6.81x, + 4.67χ, + 4.09x3 + 1.63x, = 34.94 
26.15x, + 17.96x, + 18.96x3 + 19.94x, = 198.71 
1.23x, + 2.06x, + 1.19x3 + 6.32x, = 34.20 


by the Crout method, assuming all coefficients and right-hand members to be 
exact. 


Section 10.6 


15, 16 Determine the inverse of the coefficient matrix in Probs. 9 and 10 by the 
Crout reduction, retaining five decimal places. Also evaluate the determinant 
of the coefficient matrix in each case. 
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17 Show that if A is the coefficient matrix of the equation set (10.5.1), then 


18 


25 --41 10 --ό 
-- 41 68 -—-17 10 

10 --17 5 -3 

—6 10 -3 2 
and also ἢ = det A = 1. 
Determine the inverse of the coefficient matrix of the equation set in Prob. 13 
approximately, by the Crout method, rounding all entries in the auxiliary 
matrix to only three decimal places. Then determine the matrix of the residuals 
and investigate the effectiveness of the iterative process of Sec. 10.5. (Assume 
that the coefficients of the equation set are exact.) 


Section 10.7 


19, 20 Use the results of Probs. 15 and 16 to obtain approximate upper bounds on 


the inherent errors relevant to the solutions of Probs. 9 and 10, assuming (a) that 
the coefficients are exact and the errors in the right-hand members cannot exceed 
8 in magnitude and (ὁ) that the coefficients as well as the right-hand members 
may be in error by as much as +e. In each case, determine what can be said 
about the solution if the errors in the given data are due to roundoff. 


21,22 Reestimate the error bounds considered in Probs. 19 and 20 by use of the 


23 


24 


25 


inherent-error check column. . 

Use the result of Prob. 17 to obtain the information required in Probs. 19 and 20 
for the equation set (10.5.1). Also verify that the results related to (10.5.2) and 
(10.5.3) are consistent with the bounds so obtained. 

If x,,..., x, Satisfy the equations 


show that there follows 


= OX, 0 Ci Fx r) 
Qi —— 
2 * oc, " (@=r) 
and 
Ox, _ {0 (i # r) 
2. Fg, Ee Gas) 


and deduce the relations 


Use the results of Prob. 24 and the data of Prob. 15 to obtain approximations 
to the changes in the values of x,, x», and x, in Prob. 1 corresponding (a) to an 
increase of 0.05 in cz = 8.8 and (δ) to a decrease of 0.05 in 23 = 4.3. 
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Use the results of Prob. 24 and the data of Prob. 16 to obtain approximations 
to the changes in the values of x,, x2, x3, and x, in Prob. 6, corresponding (a) to 
an increase of 0.001 inc, = 16.557 and (ὁ) to a decrease of 0.001 in a23 = 2.617. 


Section 10.8 


27 Determine a four-place solution of the following set of equations: 


28 


3.955x, — 1.013x, = 0.3068 
~1.007x, + 3.926x, — 1.023x; = 0.8669 
~1.013x, + 3.887x3 -- 1.038x, = 1.3168 

~1.021x3 + 3.841x, = 2.7997 


I 


If the interval [0, 1] is subdivided by the points x9 = 0, x, = 0.1, x2 = 0.2, 
x3 = 0.3, x4 = 0.5, x5 = 0.7, χς = 0.8, x7 = 0.9, and xg = 1.0, and if the 
function f(x) = e* is approximated by a spline s(x) with nodes at these points, 
the spline slopes at the interior nodes are related by the equation set 


Digits ee (: + : ) s + : Shai = gfe — Sent + 3 fet — Ie 
hy, hy Nyt k+1 hy πέρι 

for k = 1,2,...,7, where A, = x, — χκι [see (9.10.5)]. Determine five- 
place values of the spline slopes at the seven interior nodes if the end conditions 
So = [Ο and ὅς = f’(1) are imposed. 


Section 10.9 


29 


30 
31 


32 


33 


Determine a four-place solution of the equation set in Prob. 27 by use of Gauss- 
Seidel iteration. 

Proceed as in Prob. 29 by Jacobi iteration. 

Investigate (empirically) the efficiency of the Gauss-Seidel iteration in the case 
of the equations in Prob. 1. 

Determine the solution of the equations in Prob. 27 to four places by use of a 
relaxation procedure. 

Experiment with the application of relaxation methods to the equations in 
Prob. 1. 


Section 10.10 


34 


35 


Suppose that the equation x* + a,x + a, = Ὁ possesses real roots a and f. 


Show that the iteration Zk+1 Ξ — (@4Z, = A>)/Z, is stable at x = aif |a| > [Δ]. 
the iteration z,4, = —@2/(z, + a,) is stable at x = a if [α] « ||, and the 
iteration 7,4; = —(z? + @,)/a, is stable at x = wif 2|α] < |x + Bl. 


With the notation of Prob. 34, show that the iteration 


ZK+1 = 2 π- (zi + Q,Z, + 2) G(zZ,) 


36 


37 


38 


39 
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is stable at x = a if 0 < (a — β) φ(α) < 2, that the asymptotic convergence 
factor is p = 1 — (a — £)¢(«), and that the three iterations of Prob. 34 are the 
special cases in which ¢(x) = 1/x, 1/(x + a,), and 1/a,. 

Show that if the asymptotic convergence factor p of an iteration can be estimated 
in any way, then the formula 


OX Zee, + Ὡ (Ζκ ει -- 2%) 

1—p 
can be used to accelerate the convergence of the iteration in place of the Aitken 
A? process, and also that the latter process is equivalent to estimating p by the 
ratio (Zp42 — Ze4+1)/(Ze+1 — 24)- 
The real root « of the equation x + log x = 0 lies between 0.56 and 0.57. Show 
that the iteration z,,, = —log z, is unstable at x = a, and verify this fact by 
calculation. Then show that the iteration z,,, = e~** is stable at x = «, and 
determine « to five places. 
Suppose that the solution of Prob. 37 is required but that only values of log; 9 x 
are to be used. Determine a convenient value of the constant c for which the 
iteration 


Ze+1 = 2% — Czy, + log Ζμ) = (1 — c)z%, — cllog 10) log; % 
is stable at x = α, and use the result to determine « to five places. 
Consider the application of the iteration 

z+ 2 
3 


“k+1 = 


to the equation x? — 3x + 2 = 0. 

(a) Show that this iteration is asymptotically stable at « = 1 but unstable at 
a= 2. 

(ὁ) Show that z, > 1 as k > οὐ if —2 < zq < 2. (Prove that then z,,, is 
between z, and 1 when k 2 1.) 

(c) Show that z,,, = 2 if 7 = +2, but that convergence to a = 2 for any 
other value of Ζο is impossible. 

Consider the iterative solution of the equation tan x = x. 

(a) By superimposing the graphs of y = x and y = tan x, or otherwise, show 
that the rth positive root of this equation is in the interval [rz, (r + 4)z]. 

(b) Show that the iteration 


Zk+1 = TC + tan! Zk 


is stable for the determination of the rth positive root α,. 

(c) With [a, b] = [rz, (r + 4x] and F(x) = γπ + tan7! x, show that when 
asx S bit is true that both a < F(x) < band 0 < Γ΄) < 1. Hence 
deduce in two ways that convergence to a, is assured if a S Zp) S ὁ. 

(d) Use the iteration of part (ὁ) to determine both «, and α; to five decimal 
places. 

(The principal value of tan~1 x, for which -- ἐπ < tan™! x < 4a, is to be 

presumed.) 
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Consider the polynomial f(x) = x° + 5x — 1. 
(a) Prove that f(x) has exactly one real zero « and that 0.1 < α < 0.2. 
(b) Without more closely locating a, prove that the iteration 


Zea1 = % — F(Z) 


will converge to « if 0 < c < 1/5.008 and if 0.1 S Ζο Ξ 0.2. 

(c) With the choice c = 1/5.01, show that the asymptotic convergence factor is 
between 4 x 107-4 and 2 x 1073, so that ultimately each iteration will 
provide three or four additional correct decimal places. 

(d) Verify that, with c = 1/5.01, two iterations provide 10-place accuracy when 
Zo = 0.2, while three are needed when Zp = 0.1. 

Indicate by geometrical (or analytical) arguments why the following assertions 

are valid: 

(a) If f’(a) # 0, the false-position iteration ultimately has one stationary point. 
This situation occurs when f’(x) # 0 between z, and z,,,, and the stationary 
point is that one of z, and z,4,, at which ff” 2 0. (See Fig. 10.2.) 

(b) If at some stage the secant method is based on abscissas z, and z,4, which 
are on the same side of a zero x = a, if ff” = O at z, and at z,,,, and if 
f’ and f” are of constant sign in the interval spanned by a, z,, and z,4,, then 
convergence to ἃ is certain. 

Compare sequences generated by the false-position and secant methods when 

f(x) = x° + 5x — 1. Take zo = Ο and z, = 0.5 and obtain a five-place 

approximation to « by each method. 

Deal with the ‘“‘accelerated false-position” iteration process as follows: 

(a) Complete the indicated derivation of (10.10.18). 

(b) By writing Ζ,Ξξ α -- 8. and f, = f@ — «) = f(@) — ει (α) +--: 
= —é,f'(a) +--+ in (10.10.19), show that 


Ck+a as 1 Eg f(a) 
Ex (1 -- &+1/&)fo 


when f’(a) 4 0, and deduce (10.10.21) and (10.10.22). 

(c) Apply (10.10.19) to the equation of Prob. 43, obtaining a five-place approx- 
imation to the real root and comparing the process with the two processes 
used in that problem in terms of efficiency in this case. 

Determine the number of steps which would be required if the bisection method 

were used for the calculation of Prob. 43, and obtain the results of the first five 

steps. 


Section 10.11 


46 


Repeat the determination of Probs. 37 and 38 using the Newton-Raphson itera- 
tion both with f(x) = x + log x and with f(x) = x — e™. 
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47 Show that the Newton-Raphson iterations, as applied to f(x) = x" — a and to 


48 


49 


f(x) = 1 -- (@/x), for the determination of « = a‘, are of the respective 


forms 
Jom τ Da + | 
Zk 


jo +1)% -- =| 
a 


and that, if εκ = « — z,, there follows approximately 


Ζκ αι = 


and 


2k+1 = 


ὩΣ’ 


δ n peas 1 e2 
κει ~ — k 
2a 
and 
nt 1 2 
fe+1 ~ ek 
2a 


respectively, when z, ~ «. (Notice that the second iteration formula possesses a 
constant denominator and that the two sequences approach « from above and 
from below, respectively, when a > 0.) Also use both iterations to determine 
(3.4765)!/> and (0.049672)1/2 to five places. 
By applying the Newton-Raphson procedure to f(x) = 1 — 1/(a@x), obtain the 
recurrence formula 

241 = %(2 — az) 


for the iterative determination of the reciprocal of a without effecting division, 
and show that, if ¢, denotes the error in z,, there follows ¢,,, ~ aez when 
z, % 1/a. Also show that the iteration will converge to 1|α if 0 < zg < 2/a. 
Does it converge when Zy = 0 or Zy) = 2/a? 

Determine the smallest root of the equation tan x = cx to five places, with 
c = 1.01,c = 2,andc = 30. 


Determine all real roots of the following equations to five places: 


50 
51 
52 
53 
54 


59 


x? -- 2x -- 5 = OF 


x? — 9x7 + 18x —-6=0 

x* — 16x3 + 72x? — 96x + 24 = 0 

x* —- 3x +1=0 

x? — 3x — 4sin? x = 0 

Suppose that f(x) possesses two zeros «, and a, which are nearly coincident, so 


that f’(x) vanishes at a point β between «, and «,. By making use of the relation 


_ py2 
fla) = ΤΩ) + ἃ -- β)76) + F—P prep) Ὁ." 


+This equation was used by Wallis in 1685 to illustrate the Newton-Raphson method 
and has been included as an example in most subsequent works dealing with the 
numerical solution of equations. 
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show that if β is determined first, then initial approximations to the nearby 
zeros of f(x) are given by 


aia os [YON 


if f’(B) 4 0, and are real if f(f) and f’(f) are of opposite sign, after which 
improved values may be obtained by an appropriate iterative method. [Note 
that the case of a double root « is also included since then f(8) = 0 and B = α.] 
Also use this procedure to determine the two real roots of the equation 


3x4 + 8x3 — 6x”? — 25x +19 =0 


(which are near x = 1) to five places. 
The equation 
2x* + 24x3 + 61x? — 16x +1=0 


has two nearly coincident zeros near x = 0.1. Determine them to five decimal 
places by the method of Prob. 55. 
Proceed as in Prob. 55 with the root pair of the equation 


χϑ — 16x? + x7 + 59=0 


which is near x = 2. 
The equation 


x* — 8.2x3 + 39.41x? — 62.26x + 30.25 = 0 


has a double root near x = 1. Determine it to five places by the method of 
Prob. 55. 


Section 10.12 


59 


60 


If x = ais ἃ root of f(x) = 0, if successive approximations to α are generated 
by the iteration z,4, = F(z,), and if F(x) possesses r continuous derivatives and 
is such that 


F() = α F(a) = Ε΄(α) -- --᾿ = F®-%@) = 0 F(a) 4 0 
for some r, show that 


a — Ζκει = (τ Ὁ or F(E,) 


where €, lies between z, and «, and that the iteration is a process of order r. 
(This is the way in which order was first defined by Schréder [1870]. Here r 
must be an integer.) 

With the notation of Prob. 59, show that the iteration corresponding to the 
definition 


F(x) = x — df) -- OOP -- φ30 [70 }}} —--- 
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is at least of second order if 


l 
Θ 


1 -- of’ 


at least of third order if also 


o 


2φι f' + bf" + 2¢.f"? = 
and at least of fourth order if further 
341’ + 3φ17" + dif" + 665f'? + 6φ,7.7" + 643f = 0 


under the assumption that the ¢’s and an appropriate number of their derivatives 
are finite at x = a and that f’(a) 4 0. Thus deduce that the formula 


nema fi (AY μέ. δὴ (6) - 
fi SSNS 2fi αὐ) \fi 


with fP = f™(z,), then yields a process of order equal to the number of terms 
retained in the right-hand member. 
Rederive the formula of Prob. 60 by writing 2,4, — z, = ἢ and 


h? h? 
fat hah t Hit het Se +: 


assuming an expansion of the form h = —(¢,f, + ¢2.f27 + ¢3f3 τ ...), 
requiring that the coefficients of successive powers of f, vanish in the result of 
substituting the second expansion into the first, and so obtaining the conditions 
V— $f, =9, 2φχς — 63/4 = 9, 6b3sfi — Sbidofi + φΊ ΓΚ = 0, 


Consider the convergence of the sequence {z,} for which 


Zea1 = 10 — 19z, + 1422 — 32} 

as follows: 
(a) If the sequence converges to ἃ limit «, determine all possible values of a. 
(ὁ) For each such possible value of a, find the order of the iteration process. 
(c) Determine to which of the possible limits the convergence of an infinite 

sequence is in fact possible. 
Use the formula of Prob. 60 to approximate the real root of x3 -- x -- 1 = 0, 
taking 7) = 1.3 and calculating separately the approximations to z, afforded by 
retention of one, two, and three correction terms. Also investigate the approx- 
imations corresponding to the choice zo = 1. 
Use the results of Prob. 60, with f(x) = x? — aand with f(x) = 1 — (a/x?), to 
obtain third-order iterations leading to « = a/ in the forms 


; eb fs (Αἡ. Rp «αγ 
k+1 g\* ts ες τ ς 


1 zp 3 Zev 
Ζ = ἐμῶν 7 _ — + —Z _— — 
k+1 2 { *) 8 ε{ = 


and use them to determine (16.324)1/? and (0.049672)1/2 to four places. 


and 
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65 


66 


67 


68 


Use the results of Prob. 60 with f(x) = 1 — 1/(ax) to obtain the third-order 
iteration 


Zk4+1 Ξ z,(3 — 3az;, + a*z?) 


for the approximate calculation of « = 1/a, and show that if δὲ = « — z,, then 
&,41 = a7e3. Also account for the behavior of the iteration sequence when 
Zo = 2/a. 

By writing z, = « — δι in the iteration formula and expanding the result in 


powers of δ.» show that in the Newton-Raphson iteration there follows 


_  f'@) = tf" Oe + ἐγῶει - --- 
f(a) — fae, + $f" (Weg — +> 
when ¢, is sufficiently small, assuming appropriate differentiability of f(x). 


Thus deduce the following facts when convergence is present: 
(a) If f’(a) # Oand f’(a) # 0, then 


x41 = & 


yee (ἡ) 2 
K+1 2 ῃ ᾿ (α) 
in accordance with (10.11.4). 
(δ) If f’(a) 4 0, f’(a) = 0, and f”(«) # 0, then 
= 7() 5 
ὅκα 1 3f"(a) fx 


so that the process then is of order 3. 
(c) If f(a) = f(a) = - -- = f™-Y%@) = 0 and f(a) 4 0, with m > 1, so 
that « is of multiplicity m, then 
1 
ὅκαιι ~ (: -" 2) & 
m 


- go that the process is of first order, with an asymptotic convergence factor 
(m — 1)/m. 
Rederive the result of Prob. 66(c) by writing f(x) = (x — a)"g(x), where 
g(a) # 0, and F(x) = x — f(x)/f’(X) in the result of Prob. 59. [Show that 
r = land F(a) = (m — 1)/m.] 
If « is a zero of f(x) of multiplicity m > 1, show that the two modified Newton- 
Raphson iterations 


a a Tay 
and 
παν: Ὁ; οὐ 9 


h'(2z,) f(x) 


are both of order 2 or greater. (Use either the method of Prob. 66 or the method 
of Prob. 67. The first modification has the disadvantage that the value of m 
must be known; and both are somewhat objectionable for computation since the 
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numerator and denominator of the correction term both tend to zero as z, > ἃ 
unless an analytical simplification is possible.) 

By use of the method of Prob. 66 (or otherwise), establish the following facts for 
the secant method: 

(a) If f’(«) 4 Oand f’(«) + 0, then 


f"(@) 


—— δκδκ.-.1 
2f'(a) 


in accordance with (10.12.2), and hence (10.12.4) follows. 
(6) If « is a zero of f(x) of multiplicity m > 1, then 


δι ~~ 


Ger. (&/e—-1)" * — 1 
eK (€/&~1)" — 1 


so that the process is of order 1, with 


E41 ~ Ag 
where the asymptotic convergence factor A is the real root of the equation 
A™ + A™-1 __1=0 


In particular, A = 0.618 when m = 2 and A = 0.755 when m = 3. 
Calculate two iterates approximating the zero « = 1 of the equation x° — 1 = 0 
using Muller’s method, taking z) = 1.05, Ζ, = 0.95, and z, = 0.98; compare 
them with the Newton-Raphson values 1.00083 and 1.0000014. 

Proceed as in Prob. 70, using the semiconfluent process associated with (10.12.10) 
and (10.12.14), taking Ζο = 0.95 and z, = 0.98. 

Use the confluent formula (10.12.18) in Prob. 70, taking z) = 0.98. 

Use Halley’s formula (10.12.20) in Prob. 70, taking Ζο = 0.98. 

Use Chebyshev’s formula (10.12.22) in Prob. 70, taking zo = 0.98. 

Use Traub’s formula (10.12.25) in Prob. 70, taking z) = 1.05, z, = 0.95, and 
Zz, = 0.98. 

Ostrowski’s method. Verify that the equation 


»-πκ 7.32. Ξ Bay KZ 
defines y(x) as a rational function of x such that y = f when x = z, and 
x = %., andalso γ΄ = f’ whenx = z. By setting y(x) = 0 and identifying the 
resultant x with z,,,, deduce the iteration formula 


2 — “kK - ΜΖμρ- 1 
κιιπ 
1-—uyu 
where 
lifer ἢ 
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[Here it is true that 


Ν f(a) _ f"(@) 2 5 
ae ΓΞ ἔπ ΚΣ 


and the order of the process is r = 1 + V2 = 2.414. (See Ostrowski [1966], 
Sec. 11.2.) The confluent form of this formula is Halley’s formula, as was seen 
in Sec. 9.17. ] 

Use Ostrowski’s method (Prob. 76) in Prob. 70, taking zo = 0.95 and z, = 0.98. 


Section 10.13 


78 


79 


80 


δ] 


82 


83 


Determine F(x, y) and G(x, y) such that the Newton-Raphson iteration for a 
solution (a, B) of the equations f(x, y) = 0 and g(x, y) = 0 is expressed in the 
form 


Xep1 = ἔ(χμ Ve) Vari = GX, Vx) 


and show that F,, F,, G,, and G, vanish when (x, y) = (a, 8) in nonexceptional 
cases. 
Determine to five places the real solution of the equations 


x = sin (x + y) y = cos(x -- y) 
Determine to five places the real solution of the equations 
4x3 -- 27xy? +25=0 £4x*y -- 3ν - 1 - 0 


in the first quadrant. 
Determine to five places the real solution of the equations 


sin x sinh y = 0.2 cos x cosh y = 1.2 

nearest the origin. 

Determine approximate coordinates of the intersections of the curves y? — x* = 4 

and (x — 1)? + (y — 1)? = 1.34 as follows: 

(a) Plot the curves, together with the jacobian locus J({, g) = 0, where fand g 
are the left-hand members of the given equations; verify that the two curves 
have two intersections near the point (0.7, 2.1). 

(b) Use the method leading to (10.13.19) and (10.13.20) to determine first the 
coordinates of the intersection of f = 4 and J = 0 to five decimal places and 
then corresponding approximations to the coordinates of the required inter- 
sections. Finally, use an iterative method to improve these approximations, 
as is necessary, so that five-place values are obtained. 

(For purposes of this problem, avoid shortcuts or alternative procedures per- 

mitted by the fact that fand g are quadratic.) 

Use the method of false position to determine one solution of the equation 

pair in Prob. 82 to four decimal places, starting with points at which x = 0.63 

and 0.64. 


δ 


85 


NUMERICAL SOLUTION OF EQUATIONS 633 


Calculate two false-position iterates in the case of Prob. 80, starting with x = 1.0 
and 1.1. 

Determine the result of one step by the method of steepest descent for the 
solution of the simultaneous equations 


x7+y?=2 x-y=0 


taking ¢ = 4[(x? + y? — 2)? + & — y)*] with xo = 0.9 and yo = 1.1. Also 
compare the initial and modified values of ¢. 


Section 10.14 
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87 


88 


89 


Verify that if the coefficients of the polynomial f(x) and/or the parameter z are 
complex, and if the notation 


ας = a, + id, b, = b, + ih, z= ut iv 
is used, the recurrence formula (10.14.6) is replaced by the formulas 
δι = a + ub,_, — vby_4 b, = ἂς + ub,_1 + vby_1 

with bo = 1 and 6, = 0. Also use this procedure to evaluate 
f(x) = x9 + (2 + 3i)x? + ( — dx + 4 -- 3d) 


when x = 2.24 + 1.383. 
Determine to five decimal places the minimum value of the function 


f(x) = x* — 2.2x3 + 2.24x? — 2.24x + 1.23 


[Suggestion: Use the Birge-Vieta procedure to solve the equation f’(x) = 0. 

Also consider the sign of f’(x). | 

Suppose that a polynomial f(x) is such that f’(x) = 0 at a point β near which 

f(x) has (or may have) two nearly equal zeros. 

(a) With the notation of Sec. 10.14, show that if β is evaluated first, the values 
of two nearby zeros are given approximately by 


if Κ΄ - 0, and are real if RR” < 0. (Compare with Prob. 55.) 
(b) Apply the result of part (a) to the determination of five-place values of the 
real zeros of the polynomial 


f(x) = x* — 2.2x3 4+ 2.24x? — 2.24% - 1.21 
(Compare with Prob. 87.) 
Show that the result of replacing x by ¢ + cin 


f(x) Ξ χ' + αἰχῆ 1 +--+ + a_yx +a, 
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is of the form f(t) = «1 + RO~Y"-1 4 ROWMN-2 4... + ΚΊ + R, where 
the coefficients can be determined by continuing the process leading to (10.14.5) 
until it terminates, with « replaced by c. Also illustrate this procedure in the 
case when 


f(x) =x?—-x-1 


and c = 1.3, showing that the calculations may be arranged as follows: 


1 1 1 1 1 
0 1.3 2.6 3, 7 
—1 0.69 4.07 os 
—1 —0.103 


Apply the Lin iteration to the equation 
15 + 3.922 + 407: — 0.103 


obtained in Prob. 89, starting with to = 0. Thus obtain the real root of the 
equation x? — x — 1 = 0 to five places. 

Verify that the modified Lin method specified by (10.14.18) and (10.14.20) can 
be derived by applying the Newton-Raphson formula to R/b,_ , 


z* a R/Dy—1 
(R/b,— 1)’ 


evaluating the derivative (R/b,_,)’ at z = a, and then replacing «, by ἃ, in the 
result. [Note that b,_, = (R — a,)/z and that R = f(z) and a, = f(0).] 
Show that the recurrence formula (10.14.21) takes the approximate form 
z* = z — 0.246R when applied to the iterative solution of (10.10.6) with 
ἃ = 1.3, and verify its efficiency in this case by obtaining the real zero to seven 
places. 

For the equation 


(x — 1)(x — 2)α — 3) = x? — 6x7 + IIx —-6=0 


show that the Lin iteration is stable for the determination of a, = 1 anda, = 3, 
but is unstable for «, = 2. Then calculate the results of two iterations using 
(a) Lin’s method, (δ) the modified Lin method of (10.14.18), and (c) the formula 
(10.14.21), starting with the initial approximations ἃ, = 0.9, ἃ; = 1.9, and 
ἅς = 2.9. Show that the modified Lin method is best for the determination of 
a, and «,, but that the original Lin method is best for a3; account for the last 
fact. [Determine the order of the Lin method in this case (when « = 3).] 

Use the result of Prob. 60 to devise a third-order iteration process extending 
(10.14.10), of the form 
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where Α΄ = d,_, and d, = c, + zd,_,. Also determine the real zero of 
x? — x — 1 to six decimal places by this method, starting with z = 1.3. 


95 to 98 Determine the real roots of the equations in Probs. 50 to 53 to five places 


by use of any method exploiting synthetic division. 


Section 10.15 


99 


100 


10] 


102 


Show that the formula (10.15.3) would provide the estimate δὰ ~ —5 when 
a = —20 and da, ~ 1.2 x 10. 7 in the Wilkinson example cited in Sec. 10.16. 
Also verify that with this estimate, the second-order term neglected in the 
approximation 


(a + da)2° = α29 + 20a!?5a + 190018 (5a)? + --- 
029 4+ 2001 Sa 


δὲ 


[which is one of several approximations underlying (10.15.3) in this case] is 
about equal in magnitude to the first-order term retained in that case, so that the 
estimate (appropriately) should not be accepted quantitatively in this case, but 
only as a strong danger signal. 

The equation 


x* -- 15.2x3 + 59.7x? — 81.6x + 36.0 = 0 


has roots approximated by 1.0, 1.2, 3, and 10. Determine approximately the 
maximum inherent error in each root, assuming (a) that each coefficient in the 
equation (except the leading one) may be in error by +0.1 and (ὁ) that each such 
coefficient is correct within 1 percent. 

If ἃ is an approximation to a root « of the equation f(x) = 0, and if f(x) = ε, 
show that « — « = -- εἰ (ὦ, where € is between ἃ and « if f’(x) is continuous. 

Suppose that an approximation “, to a zero αι, of a polynomial f(x) has been 
found, that the polynomial g,(x) is the consequence of dividing f(x) by x — ἃ; 
and ignoring the constant remainder, and that an approximation &, to another 
zero a, Of f(x) is obtained as an exact zero of g,(x). By writing 


70) = ὦ — &)gi%) + FO) 
and using the result of Prob. 101, show that 
ΝΕ > f (€1) 


αλ = (a, — ) 
7 G2) 
where ¢, is between a, and o&,. Thus deduce that when approximations to 
1, &2,..., &,... are obtained in that order by successive deflations, there 


follows 
ἄμε. — ἄκει © ee | - Oy) + dx 


Γ΄ (OK 41) 


636 


103 
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where 6, is of the order of the adopted error tolerance. {Hence a significant error 
amplification generally occurs if |f’(a,41)| « |f’(a,)|.] 

Illustrate the result of Prob. 102 in the case of the polynomial f(x) = 
(x — 1 -- 4 — 4.1), showing that deflation with the approximation 4.09 
to the largest zero yields the nearly equal smaller zero with about the same 
accuracy and the smallest zero with an error of about 3 x 10. 5, whereas 
deflation with 1.01 as as approximation to the smallest zero yields the other 
zeros with errors of magnitude exceeding 0.12; relate these facts to Prob. 102. 
[In the second case, the quantitative deviation is due to the rapid variation of 
S'(x) near x = 4.] 


Section 10.16 


104 


105 


106 


107 


108 


109 


110 


Determine the largest root in Prob. 51 to four places by Bernoulli iteration. 
Determine the largest root in Prob. 52 to four places by Bernoulli iteration. 
Also, after replacing x by 1/x, determine the smallest root in a similar way. Then 
determine all roots to four places. 

Show that the Bernoulli iteration converges very slowly when applied to Prob. 50, 
and account for this fact. Then translate the origin to a convenient point near 
the root (see Prob. 89), replace x by 1/x, and apply the iteration to determine the 
reciprocal of the real root to six places. 

Show that the yu, 5, and ¢ sequences all behave unsatisfactorily when the Bernoulli 
iteration is applied to Prob. 53. Then replace x by 1/x, use the iteration to 
determine the smallest root to four places, and determine the other real root after 
translating the origin to a nearby point and replacing x by 1/x. Finally, determine 
the remaining roots and account for the original difficulty. 

Apply the Bernoulli iteration to the equation 


x* — 8x3 + 39x? — 62x + 50 = 0 


determining the larger pair of complex roots to three significant figures. 
Show that if |a,| > |α2] > |a3| and if C, 4 0 and C, # 0 in (10.16.4), there 
follows 

a, ~ r + AB 


where B = «,/«,, with the notation of (10.16.5), and where 0 < β < 1, so that 
Aitken’s A? process then is applicable for accelerating the convergence of the 
Bernoulli iteration. [Compare (10.10.9). | 

Apply the A? process to the determination of the largest root in Prob. 51 to five 
places by Bernoulli iteration. 


Section 10.17 


111 to 114 Determine all roots of the equations in Probs. 50 to 53 to three decimal 


places by the Graeffe procedure. 


115 


116 
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Use the Graeffe procedure to determine all roots of the equation in Prob. 108 

to three significant figures. 

Suppose that the Graeffe iteration is applied to the polynomial f(x) = x* — 1. 

(a) Show that A(x) = & -- 15, 5.6) = (& — 1)8, and so on, and hence 
that after k root squarings with k = 2 one would know only that the four 
zeros of f(x) are among the (2")th roots of unity and, in particular, that they 
lie on the unit circle |z| = 1 in the complex plane. 

(6) Show that if the same iteration were also applied to the polynomial f(x + ¢) = 
(x + e)* — 1, where 0 < « < 1, it would be found that the zeros of that 
function lie on circles of radii 1 + ¢, 1 — e, and V1 + ε2 and center at 
z = 0, and hence that the zeros of f(x) lie on the circles |z — εἰ = 1 + 2, 
1 — e,and V1 + ε2. 

(c) Verify (a sketch is sufficient) that the intersections of |z| = 1 with the 
circles specified at the end of part (δ) uniquely determine the four zeros of 
f(x). 

(The same method succeeds in less contrived situations, provided that ς is 

taken to be sufficiently small.) 


Section 10.18 


117 


118 


119 


With the notation of (10.18.2), show that if x — z is a factor of x? + px + 4g, 
then f(z) = Rz + S. Thus deduce that if f(x) is a polynomial, then quadratic 
synthetic division can be used to evaluate f(u + iv) in the form 


fut+w=aut+wR+S 
where R and S correspond to 


p=-2u qz=wuw+ov? 
Also illustrate this procedure by evaluating f(2 + 3) when 
f(x) = x? — 4.17x* + 2.81x3 + 1.09x? + 3.21χ + 1.42 


(Compare with the procedure of Prob. 86.) 
Show that the equation 


x* — 9.00x? + 29.08x? — 39.52x + 18.82 = 0 


has roots near x = 1 and x = 2 and two roots near x = 3, and determine the 
roots to four decimal places by extracting an approximate quadratic factor by 
Lin iteration, starting with (x — 3)’. 

Determine all roots of the equation 


x* + 9x3 + 36x? + 51χ +27=0 


to four decimal places by iterative Lin extraction of a quadratic factor. 
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122 
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With the notation of Sec. 10.18, show that the Lin iteration (10.18.11) can be 
written in the form 


* ._ 4Κ * ~_ 485 
Pp — p z, . S q q a. Ss 
and that 
Rx, + S = f(x) Rx, + S = f(X2) 
where x, and x, are the zeros of x” + px + 4, and hence deduce the relations 
x 
x,(p* — p) + (qt - φ- LEX 
a,—-s 
x 
x2(p* — p) + (αὖ -- 4) = 94%) 
a,—-S 


Then show that these relations can be written in the forms 


(xt — x1)(2 — 2) 


(x2 - χυ)αῖὶ - x)= 4 = 
a, —S 


(x2 — χροῦ -- x2) = - 2E2 + Of - χροξ - x2) 


n 


where x* and x* are the zeros of x? + p*x + q*, and deduce that, when (x;, x2) 
and (xt, x3) are near (α,, 2), there follows 


22 


αι -- ΧῚ Ι a 1A f'(«) 


| (αι, — χα) = ρι(α, -- χα) 
a2 — ἅι Ay 


01%, f'(a2) 


α — x2 Ὁ |1 -- (a2 -- X2) = ρχία -- 2) 
a2 — 4% GA, 


where p, and p, are the convergence factors listed in (10.18.12). Thus show that, 
if the zeros x, and x, of x? + px + q approximate two zeros a, and a, of 
f(x), and if x* and x* are the zeros of x? + p*x + α΄, then xj is generally a 
poorer approximation to a, than x, unless |p,| Ξ 1 and x} a poorer approx- 
imation to a, than x, unless |p,| Ξ 1. 

In the case of a quartic equation, show that the asymptotic convergence factors 
for the Lin method relevant to the root pair «,, #2 are 


py = —b (as + oy — αι) ρλ = —2- (ας + ας — 2) 
X3%q 3X4 

where ας and a, are the remaining roots. Show, in particular, that the Lin 
iteration should converge rather rapidly to the root pair near x = 3 in Prob. 118, 
but that convergence to the pair near x = 1 and x = 2 should be very slow; 
verify the last fact by direct calculation. 

Determine the first five quadratics yielded by the Lin iteration as applied to the 
equation x* — 4x3 + 7x? — 16x + 12 = 0, starting with p = q = 0, and 
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show that one of the zeros of the sequence of quadratics tends to approximate 
the smallest zero (x = 1) of the given equation. Also, use the fact that the four 
zeros are x = 1, 3, and +2i to show that this situation is in accordance with 
the results of Prob. 120. 


Section 10.19 


123, 124 Repeat the determinations of Probs. 118 and 119, using the Bairstow 


125 


126 


127 


128, 


130, 


132 


iteration. 

Show that, if the Newton-Raphson iteration is applied to the equations 5,_, = 
and ὁ, = 0, rather than to the equivalent equations b,_, = 0 and ὁ, + 
pb,-1 = 0, then the Bairstow procedure is modified only to the extent that c,_; 
is replaced by c,_, in (10.19.16), so that a// elements in the c column then are to 
be calculated from elements in the b column by the same rule. Also, apply this 
procedure to the example in the text, showing that the modification appears to 
lead to somewhat slower convergence in that case. | 

With the notation of (10.18.2) and (10.19.11), show that the Bairstow iteration 
can be described by the equations 


(δ΄ — pR’) Ap + R’ Aq=R 
—qR’ Ap + S’Aq=S8S 


(These are the forms originally given by Bairstow.) 

Show that if Bairstow’s method is modified so that the values of the c’s deter- 
mined for p and @ are used in all steps, then in the case of Eq. (10.18.15), with 
p = —1.9andg = 1.9, the iteration formulas can be written in the form 


Ap = 0.0586R + 0.01048 
Aq = —0.0198R + 0.03885 


and determine the results of two iterations. Also compare them with the results 
of the corresponding Lin method obtained in Sec. 10.19. 

129 Use the modified Lin iteration in Probs. 118 and 119, starting with (x — 3)? 
in the first case and with x* — 2.0x + 1.3 in the second. 

131 Proceed as in Probs. 128 and 129 with the stationary modification of 
Bairstow iteration. 
The equation 

2x* + 24x3 + 61x” -- 16x +1=0 


has two nearly equal small roots (see Prob. 56). Determine them to seven places 
by any method employing quadratic synthetic division. 


APPENDIX A 
JUSTIFICATION OF THE CROUT REDUCTION 


The Gauss reduction of Sec. 10.3 reduces the set of equations 


Q11*%1 + Q12X2 + Q13%3 teeet QinXn = Ci 
ἄχιχιὶ + Azq2X_ + An3X3 Ὁ" + αχηχῃ = Cp (A.1) 
AniXy Ἵ AnazX2 + An3X3 +++ + AnnXy = Cy 
to an equivalent set of the form 
Xy + AyoX2 + A43X3 Ἔ τ. + αἔηχη = ΟἹ 
, , a 
Xo + 493X3 + °° + αγηχῃ = C4 (A.2) 
Xn = Cy 


after which the required values of x,, x,-1,..., X; are obtained simply by solving 


the set (A.2) successively, in reverse order. 
Since, in the Gauss reduction, the kth equation of (A.2) is obtained by a sequence 


of operations which involve the subtraction of multiples of the first k — 1 equations 
of (A.2) from the kth equation of (A.1) and the division of the result by a constant, it 
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follows also that the kth equation of (A.1) can be expressed as a linear combination of 
the first kK equations of (A.2), so that a set of constants a;, exists, with i 2 j, such that 
Qy1Cy = Cy 


͵ t f a 
A21Cy + Az2C2 = (2 (A.3) 


oe e@t# ee eoeese*# " ὁ © ὁ ὁ ὁ ὁ 


ΩΓ Οὐ + Ange + Qn3C3 + τ. + AnnCn = Ch 


The Crout reduction amounts to first determining the coefficients a;,; in such a 
way that the elimination of cj,..., c, between (A.2) and (A.3) lead to (A.1), then 
determining c/,..., c, from (A.3), and finally resolving (A.2) for x1,..., X, by the 
“back solution” of the Gauss procedure. In order to simplify the derivation of formulas 
for the determination of the coefficients aj, involved in both (A.2) and (A.3), it is 
convenient to introduce the temporary notations 


αι - (4 αἙ} a, (° G2) wy 
0 G</s a; («ἢ 


The three sets (A.1) to (A.3) can then be specified by the equations 

n 

Σ Qj jXj = C; (A.1’) 
Xk + 2. βωχὶ ΞΞ οἰ, (A.2’) 


and 


= Cj (A.3’) 


M 
g 
= 
= 
| 


where all indices range from 1 to ἡ. 
The introduction of (A.2’) into (A.3’) then gives 


n n n 
>, αἀμχκ + > (2 2ubs) xj= Cj 
k=1 j=1 1 


k= 
and this relation is equivalent to (A.1’) if 
iy + Z. CinPey = αι; (A.5) 


By virtue of (A.4), the first term αἱ; is zero unless i 2 j and the summand in the second 
term vanishes unless both k < iand k < j. Thus, when i 2 j, (A.5) becomes 


i=1 


aig + ΕΣ Ang = 4; (Ξ ἢ (A.6) 


whereas, when i < j, it can be written in the form 


i-—1 


αἰαὶ; + ΕΣ Qin = αι) (i < j) (A.7) 
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These relations, together with the relations 


ἱ--1 
aie; ΠΣ ack =e, (AB) 
and 
n 
Xx; + > Dip Xy + αἱ (Α.9) 
k=i+1 


which are equivalent to (A.3’) and (A.1’), respectively, are identical with the relations 
(10.4.4) to (10.4.7), establishing the validity of the Crout reduction as described in 
Sec. 10.4. 

Clearly, the compactness of the relevant tabulation follows from the fact that, 
after suppressing the diagonal Is in the coefficient matrix of (A.2) and the right-hand 
members of (A.3), which are also contained in the matrix of (A.1), the remaining 
elements of the two matrices associated with (A.2) and (A.3) can be recorded in a 
single auxiliary matrix. In order to establish the relation 


Qiiaij = Qji Ci < J) (A.10) 


in the special case when the given coefficient array is symmetric, so that 


πηι ΞΞ a; j (A.11) 
we may verify that (A.6) and (A.7) imply the relation 
ἱ-- 1 
αὐἰαΐ,) — Ay = ἄρ — απ + ἐς (αἰκας; -- αἸκαζι) (i « j) 
which can be written in the form 
i-1 
αἰαΐ; — Aji = 2 [agi (anak; sa ajx) = Oj (Ajy Aki — aix)] (i <j) (A.12) 


if (A.11) is true. When i = 1, the sum on the right is absent, so that (A.10) is estab- 
lished in that case. When i > 1, (A.12) expresses aj,a;; — ἢ as a linear combination 
of terms of the form αἱ αἱ, — aj, where r < iandr « 5, so that (A.10) is established 
by induction on i. 

The validity of the gross-error check described in Sec. 10.4, follows from the 
fact that an increase of each solution element x; by unity would correspond to an 
increase of c; by >f_, ἄμ.» according to (A.1’), and likewise to an increase of οἱ by 
1+ Σῆέν με Gy,» according to (A.2’). 

Since the Ath equation of (A.2) can be obtained by subtracting from the kth 
equation of (A.1) a certain linear combination of the first k — 1 equations of (A.1), 
and dividing the result by a@,,, it follows, from the elementary properties of deter- 
minants, that the determinant of any square array of order k formed from elements in 
the first kK rows of the augmented matrix of (A.1) is given by the result of multiplying 
the determinant of the corresponding array in (A.2) by a 422" α΄ κ. In particular, we 
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thus obtain the useful results 


a a 11 
τ ἢ 11 12. t 7 
Q41 = ἄς, = Q11422 a1 
421 422 
a34 
Q11 
ani 


412 413 

Ω22) (23) = 441422433 

432 433 
Qin 

ene = G}102°°* Ann (A.13) 
Qnn 


When the matrix composed of the coefficients in (A.1) is symmetric, it is said to 
be also positive definite if and only if each of the n principal minors indicated in (A.13) 
is positive. It follows that this matrix is positive definite if and only if all the diagonal 
elements of the associated Crout auxiliary matrix are positive. 
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APPENDIX C 
DIRECTORY OF METHODS 


A INTERPOLATION 


I Based on polynomials 
(a) Using an arbitrary set of ordinates without differences 
(1) Noniterative: Sec. 3.2 
(2) Iterative: Sec. 2.7 
(5) Using differences formed from ordinates at equally spaced points 
(1) Near beginning or end of tabulation: Sec. 4.3 
(2) Inside tabular range 
(a) Using both odd and even differences: Secs. 4.5 and 4.6 
(5) Using only even differences or only odd differences: Sec. 4.7 
(3) With throwback: Sec. 4.10 
(c) Using divided differences formed from an arbitrary set of ordinates: Sec. 2.5 
(d) Using ordinates and slopes: Sec. 8.2 
(e) Using ordinates at appropriately selected points: Secs. 9.6 and 9.7 
(f) In two-way tables: Probs. 3 to 5 of Chap. 5 
(g) Inverse: Sec. 2.8 
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Based on ratios of polynomials: Secs. 9.14 to 9.17 

Based on sines and/or cosines: Prob. 7 of Chap. 3; see also 847 
Based on exponential functions: see BST 

Based on cubic splines: Secs. 9.10 to 9.13 


APPROXIMATION 


By polynomials 
(a) Determined as truncated Taylor expansions: Secs. 1.3 and 1.9 
(b) Determined by exact fit over a discrete set of points: see Al 
(c) Determined by least-squares methods 
(1) Using an arbitrary finite set of ordinates: Sec. 7.3 
(2) Using ordinates at equally spaced points: Sec. 7.13 
(3) Using a continuous set of ordinates 
(a) Over a finite interval: Secs. 7.6 and 7.9; Prob. 33 of Chap. 7 
(b) Over a semi-infinite interval: Sec. 7.7 
(c) Over an infinite interval: Sec. 7.8 
(d) Economized by use of Chebyshev polynomials: Sec. 9.8 
(e) Determined by a minimax condition: Sec. 9.9 
By products of exponential functions and polynomials: Secs. 7.7 and 7.8 
By ratios of polynomials: see A2 
(a) Uniformized: Sec. 9.18 
By sines and/or cosines 
(a) With prescribed periods 
(1) Using a finite set of ordinates: Secs. 9.3 and 9.7; Prob. 33 of Chap. 7 
(2) Using a continuous set of ordinates: Sec. 9.2 
(b) With periods to be determined: Sec. 9.5 
By exponential functions: Sec. 9.4 
By cubic splines: Secs. 9.10 to 9.13 


NUMERICAL DIFFERENTIATION 


Using ordinates without differences: Secs. 3.3, 3.11, and 9.13 
Using differences formed from ordinates at equally spaced points 
(a) Near beginning or end of tabulation: Sec. 5.3; Prob. 5 of Chap 4 


(δ) Inside tabular range 


(1) Near tabular point: Sec. 5.3; Prob. 13 of Chap. 4 
(2) Near midpoint between tabular entries: Sec. 5.3; Prob. 16 of Chap. 4 


+ The approximating functions obtained by least squares incorporating a number of 
ordinates equal to the number of independent coordinate functions fit the data 
exactly at those points and thus are strictly interpolative functions. 
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D NUMERICAL INTEGRATION 


1 


Lr 


La 


Using ordinates at equally spaced points without differences 

(a) Without end corrections: Secs. 3.5, 3.6, 3.8, and 9.13 
(1) Integrals of the form [ὃ f(x) cos kx dx or [" f(x) sin kx dx: Sec. 3.10 

(b) With end corrections: Sec. 5.8; Prob. 18 of Chap. 5 

Using differences based on ordinates at equally spaced points 

(a) Over range near beginning or end of tabulation: Sec. 5.4; Prob. 5 of Chap. 4 

(b) Over range centered at interior tabular point: Sec. 5.6; Prob. 13 of Chap. 4 

(c) Over range centered midway between successive tabular points: Probs. 16 and 
18 of Chap. 4 

Using ordinates at appropriately selected points 

(a) Integrals of the form [ἢ f(x) dx: Sec. 8.5 

(δὴ) Integrals of the form [Ὁ e~ f(x) dx: Sec. 8.6 

(ὦ Integrals of the form [Ὁ (x — a)’e~*f(x) dx: Sec. 8.6 

(d) Integrals of the form [ὃ e~***f(x) dx: Sec. 8.7 

(6) Integrals of the form [5 f(x) dx/Va? — x?: Sec. 8.8; Prob. 41 of Chap. 8 


(f) Integrals of the form [ἡ f(x)V a? — x? dx: Prob. 28 of Chap. 8 

(g) Integrals of the form [ὃ (x — αὐ — x)*f(x) dx: Sec. 8.9 

(h) Integrals of the form [Ὁ f(x) sin kx dx or [5 f(x) cos kx dx: Sec. 8.16 

(i) Integration formulas involving ordinates at one or both of the integration 
limits: Secs. 8.11 and 8.12; Probs. 32 and 41 of Chap. 8 

(j) Integration formulas employing equal weights: Sec. 8.14 

(k) Algebraic derivation of miscellaneous formulas: Sec. 8.15 

Using ordinates and slopes: Sec. 8.3 (see also Sec. 6.12) 

Repeated: Secs. 5.5 and 5.6; Prob. 15 of Chap. 5 

Two-way: Probs. 40 and 41 of Chap. 3 


SUMMATION OF SERIES 


Finite sums of polynomials: Secs. 5.8 and 7.11 

Approximate summation of series: Secs. 5.8 and 5.9; Prob. 8 of Chap. 1 and Prob. 
37 of Chap. 5 

(a) Terms of constant sign: Sec. 5.9 [Eqs. (5.9.1) and (5.9.2) | 

(b) Terms of alternating signs: Sec. 5.9 [Eqs. (5.9.7) and (5.9.8) | 


SMOOTHING OF DATA 


By determining a smooth approximating function: Sec. 7.13 (see also Bic and B4) 
By point-by-point modification of data: Sec. 7.15 
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NUMERICAL SOLUTION OF ORDINARY 
DIFFERENTIAL EQUATIONS 


Initial-value problems: see Sec. 6.17 
Boundary-value problems: Sec. 6.15 
Characteristic-value problems: Sec. 6.16 


NUMERICAL SOLUTION OF EQUATIONS 


Sets of linear algebraic equations 
(a) By use of determinants: Sec. 10.2 
(δὴ) By a finite sequence of reductions: Secs. 10.3, 10.4, and 10.8 
(c) By iteration: Sec. 10.9 
Nonlinear equations 
(a) General iterative methods: Secs. 10.10 to 10.12 (see also Secs. 2.8 and 9.17) 
(b) Special iterative methods for algebraic equations 
(1) Approximate determination of largest or smallest root: Sec. 10.16 
(2) Simultaneous approximate determination of all roots: Sec. 10.17 
(3) Approximate determination of one root: Sec. 10.14 
(4) Simultaneous approximate determination of two roots: Secs. 10.18 and 
10.19 


MISCELLANEOUS PROCESSES 


Inversion of power series: Probs. 16 and 46 of Chap. 1 

Expansion of one function in powers of another: Sec. 1.9 

Locating maxima or minima: Prob. 23 of Chap. 2 

Checking tables by use of differences: Sec. 4.9 

Expression of differences in terms of derivatives: Sec. 5.3 
Subtabulation: Sec. 5.7; Probs. 21 and 22 of Chap. 5 

Calculation of mean values over given intervals from known mean values over 
other intervals: Prob. 23 of Chap. 5 

Determination of unknown periodicities from empirical data: Sec. 9.5 
Continued-fraction representations: Secs. 9.14 to 9.17 

Evaluation of determinants: Sec. 10.4 

Inversion of matrices: Sec. 10.6 


Index 


Italic numbers in parentheses following page references indicate problem numbers. 


Adams’ method, 251, 272 
modified, 257, 272 
Aitken’s A? process, 571 
Aitken’s iterative interpolation, 66 
Approximants, 496 
Approximation, 3 
methods of, 660 
Asymptotic convergence factor, 568 
Asymptotic error constant, 578 
Asymptotic series, 9 
Asymptotic stability, 568 
Augmented matrix, 540 
Automatic integrators, 115 
Averaging operator, 175 


Backward differences, 130 
Bailey’s method, 582 


Bairstow’s iteration, 613 
stationary modification of, 618 
Banachiewicz’ reduction, 545n. 
Bernoulli numbers, 199 
Bernoulli polynomials, 228 (24), 229 
(25) 
Bernoulli’s iteration, 598, 636 (106, 
109) 
Bessel’s interpolation formula, 141 
coefficient tables, 160, 161 
Binomial coefficients, 87, 135, 344 
Birge-Vieta iteration, 590 
Bisection method, 575 
for two simultaneous equations, 587 
Boundary-value problems, 293, 303 
Bridging differences, 222 
Budan’s rule, 37 
Biirmann series, 36 
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Central differences, 131 
mean, 139 
Centroid method, 115 
Cesaro summation, 208 
Characteristic-value problems, 297, 619 
Chebyshev approximation, 336, 372 
(33), 476 
Chebyshev-Gauss quadrature, 398, 420, 
438 (28) 
Chebyshev interpolation, 469 
Chebyshev polynomials, 337, 352, 418, 
525 (28) 
of second kind, 340, 372 (33) 
Chebyshev quadrature, 414, 425 
Chebyshev’s formula, 582 
Cholesky’s reduction, 545n. 
Chopping, 10n. 
Christoffel-Darboux identity, 342, 389 
Christoffel numbers, 388 
Clenshaw’s method, 344 
Coefficient matrix, 540 
Cofactor, 321, 540 
reduced, 321, 541 
Composite rules for numerical 
integration, 95 
convergence of, 122 (30) 
Condition of matrix, 559n. 
Conjugate-gradient method, 564 
Continued-fraction approximation, 494— 
514 
convergents of, 496, 502 
by Thiele’s expansions, 510 
Convergence of series, tests for, 40 (6, 
7), 41 (10), 170n. 
Convergence factor, 255, 568, 571 
asymptotic, 568 
Convergents, 496, 502 
Cotes integration formulas, 95 
Cramer’s rule, 541 
Critical table, 79 (45) 
Crout’s reduction, 545, 641 
check columns for, 548, 558, 642 
for tridiagonal systems, 559 
Cubature, 123 (40), 124 (41) 
Cyclic displacement, 565 
Cyclic relaxation, 565 


Deferred approach to the limit, 100n. 
Deflation, 596 
Degree of precision, 209, 386 
Delta operator, 177 
A? process, 571 
Derivative: 
inverse, 509 
reciprocal, 509 
Descartes’ rule, 37 
Determinacy: 
of functions, 16 
of zeros of polynomials, 595 
Determinant: 
evaluation of, 548 
Jacobian, 585 
Vandermonde’s, 116 (5) 
Diagonal element, 546 
Diagonal row dominance, 563 
Difference equation, 258, 349, 482, 527 
(39) 
Difference operator, 175 
Differences: 
backward, 130 
central, 131 
divided, 52, 54, 495 
forward, 130 
inverted, 496 
mean central, 139 
modified for throwback, 153 
reciprocal, 507 
Differential equations, numerical solution 
of, 240-303 
boundary-value problems, 293, 303 
characteristic-value problems, 297, 
619 
selection of method for, 301 
Differential operator, 175 
Differentiation, numerical, 85, 110, 163 
(5), 164 (13, 16), 181, 491, 660 
by iterative method, 113 
by use of spline approximation, 491 
Displacement: 
cyclic, 565 
simultaneous, 565 
Distribution, normal, 24 
Divided differences, 52, 54, 495 
confiuent forms of, 56 


Divided differences: 
as contour integrals, 171 (38) 
of product, 72 (10) 
Doolittle’s reduction, 545n., 619 
Double-precision operations, 20 


Economization of power series, 471 
Eigenvalue, 297 
Equal-ripple property, 475, 517 
Equations: 
normal, 316 
solutions of, 539-618 
(See also Differential equations; 
Linear algebraic equations ) 
Equilibration, 552 
Error formulas: 
in integration, 208, 216 
G method, 210 
Q method, 218 
V method, 215 
in interpolation, 61, 62, 171 (39) 
Error function, 24, 41 (12), 42 (13, 14) 
Errors, 5 
classification of, 5 
detection of, 151 
in difference tables, 151, 161 
gross, 5 
inherent, 5, 85, 555 
machine, 19 
probable, 26 
random, 23 
relative, 11 
roundoff, 5 
truncation, 5 
Euler-Maclaurin sum formula, 197, 493 
second, 201 
Euler sum of divergent series, 206 
Euler’s constant, 232 (38) 
Euler’s method, 251 
modified, 257 
Euler’s transformation, 206 
modified, 206 
Everett’s interpolation formula, 143 
coefficient tables, 160, 161 
second, 144, 166 (19) 
coefficient tables, 160, 161 
Expected value, 25 
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Exponential approximation, 115, 457 
Extrapolation: 

Aitken’s, 571 

Richardson’s, 100, 292n., 492, 494 


Factorial, Stirling’s approximation to, 
104, 233 (39) 
Factorial power functions, 344 
multiplication formula for, 375 (48) 
False position, method of, 572, 626 (42) 
accelerated, 574, 626 (44) 
for two simultaneous equations, 587 
Filon quadrature, 108 
with corrections, 110 
First law of the mean, 32 
Fixed-point mode, 20 
Flinn’s method, 115 
Floating-point mode, 20 
Forcing, 10 
Forward differences, 130 
Fourier approximation, 447, 452 
Fourier transforms, 427 
Fourier’s conditions, 577n. 
Frequency function, 23, 45 (25) 
normal, 24 
uniform, 45 (27) 
Friedman’s iteration, 620 
Fundamental theorem of elementary 
algebra, 36 


Gauss-Jordan reduction, 544 
check column for, 621 (4) 

Gauss-Seidel iteration, 563 

Gauss’ interpolation formulas, 136 
trigonometric, 117 (7) 

Gauss’ reduction, 543 
check column for, 548, 621 (4) 

Gauss’ sum formula, 203, 226 (18), 231 

(31) 

Gaussian quadrature, 387 
convergence of, 412 

Gill’s method, 292 

Graeffe’s iteration, 602 

Gram approximation, 350 

Gram polynomials, 352 
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Gram-Schmidt process, 374 (40), 378 
(64) 

Gregory’s interpolation formulas 
(Newton’s interpolation formulas), 
134 

Gregory’s sum formula, 203, 230 (29) 

Gross errors, 5 

check column, 548, 621 (4), 642 


Halley’s method, 513n., 582 

Hamming’s method, 265, 306 (/8) 

Harmonic analysis, 447, 452, 469 

Hermite approximation, 334 

Hermite-Chebyshev quadrature, 442 
(47) 

Hermite-Gauss quadrature, 395 

Hermite interpolation, 382 

trigonometric, 117 (7) 

Hermite polynomials, 334 

Hermite quadrature, 385 

Heun’s methods, 288, 290 

Hitchcock’s iteration, 613 

Horner’s method, 28, 589, 633 (86), 637 
(117) 

Hutton summation, 208 


Ill-conditioned polynomials, 597 
I1l-conditioned systems, 559 
Imaginary number, 37n. 
Influence function, 211 
Inherent errors, 5, 85, 555 
in linear algebraic equations, 555 
check column for, 558 
Inner product, 546 
Integral equation, 247, 293 
Integral operator, 175 
Integration, numerical, 85,91, 115, 123 
(40), 124 (41), 162 (5), 164 (13, 
16), 166 (78), 186, 189, 192, 225 
(15), 379—432, 493, 661 
repeated, 189, 194 
(See also Error formulas) 
Interpolation, 5 
collocative, 5 
error formulas in, 61, 62, 171 (39) 


Interpolation: 
to halves, 142 
inverse, 68, 71, 115, 161, 168 (30), 
513 
iterated, 66 
linear, 52 
methods of, 659 
osculating, 384 
second-degree, 58 
tables of coefficients, 90, 159-161 
trigonometric, 115, 117 (7) 
in two-way tables, 162, 167 (24), 223 
(3, 4), 224 (5, 6) 
Interpolation formulas (see specific 
formulas ) 
Interpolation series, 154, 162 
Inverse derivatives, 509 
Inverse interpolation, 68, 71, 115, 161, 
168 (30), 513 
error bounds for, 78 (39) 
Inverse matrix, 542, 543 
Inverse operator, 176 
Inversion: 
of matrices, 543 
of series, 43 (16) 
Inverted differences, 496 
Irreducibility: 
of matrix, 563 
of rational function, 499 
Iterated interpolation, 66, 71 
Iterative process, order of, 578, 628 (59, 
60) 


Jacobi-Gauss quadrature, 399 
Jacobi polynomials, 339 
Jacobian determinant, 585 
Jacobi’s iteration, 562 


Kronecker delta, 82, 383 
Kutta’s methods, 290 


Lagrange’s interpolation formula, 81 
coefficient tables, 90, 159, 161 
Laguerre approximation, 332 
Laguerre-Chebyshev quadrature, 442 
(46) 


Laguerre-Gauss quadrature, 392 
generalized, 394 
Laguerre polynomials, 332 
generalized, 340 
Lambert’s method, 582 
Lanczos’ economization process, 471 
Law of the mean: 
first, 32 
second, 33 
Least-squares approximations, 314 
over discrete sets of points, 318, 348 
error in coefficients, 323, 325 
on intervals, 327 
observed errors, estimation of, 323, 
325 
weights in, 315, 322 
Legendre approximation, 329, 467 
Legendre-Gauss quadrature, 390 
Legendre polynomials, 330, 467 
Leibnitz’ formula, 87 
Linear algebraic equations, 539-567 
homogeneous, 543 
inherent errors in, 555 
solvability of sets of, 542 
Linear interpolation, 52 
error bound for, 75 (24) 
Lin’s iteration, 591 
modified, 593, 634 (91) 
quadratic, 610 
modified, 616 
Lobatto quadrature, 409 


Machine errors, 19 
Maclaurin’s integration formulas, 121 
(28) 
Madelung’s method, 310 (37) 
Matrix: 
augmented, 540 
coefficient, 540 
inverse, 542 
inversion of, 552, 553 
positive definite, 644 
rank of, 542 
transpose of, 564 
Maxima and minima, numerical 
determination of, 75 (23), 477 
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Mean value, 24, 227 (23) 
Mean-value theorems, 32, 33 
Mehler quadrature, 400 
Midpoint formula, 95 
Milne’s method: 
for differential equations, 257, 276 
for sets of nonlinear equations, 587 
Minimax approximation, 475, 517 
Minor, 540 
principal, 644 
Modified differences, 153 
Modulus of precision, 24 
Moments, 336 
Moulton’s method, 257 
Muller’s method, 580 


Neville’s method, 71 
Newton-Cotes integration formulas, 91, 
103 
Newton-Raphson iteration, 575, 590, 630 
(66, 67) 
modified, 578, 630 (68) 
for simultaneous equations, 584 
Newton’s backward-difference formula, 
135 
coefficient tables, 161 
Newton’s divided-difference formula, 60 
confluent form of, 74 (20) 
Newton’s forward-difference formula, 
134 
coefficient tables, 161 
Newton’s power-sum identities, 38, 425, 
600n. 
Newton’s product-sum identities, 37 
Newton’s rule, 94 
Noise level in tables, 152 
Normal! distribution, 24 
Normal equations, 316 
Normalization, 317 


Obrechkoff’s formulas, 284 
Odd-harmonic function, 176 
Operator, 175 

averaging, 175 

delta, 177 
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Operator: 

difference, 175 

differential, 175 

integral, 175 

inverse, 176 

reductive, 177 

shifting, 175 

theta, 177 
Order of iterative process, 578, 628 (59, 

60) 

Orthogonal polynomials, 327, 348, 363 
Orthogonality, 317 
Osculating interpolation, 384 
Ostrowski’s method, 631 (76) 
Overrelaxation, 566 


Padé table, 511 
Parabolic rule, 96 
Parasitic solutions, 260 


Periods, numerical determination of, 462 


Picard’s method, 247 
Pivoting, 552 
Positive definite matrix, 644 
Precision: 
degree of, 209, 386 
modulus of, 24 
Principal diagonal, 546 
Principal minor, 644 
Probable error, 26 
Prony’s method, 458, 462 


Quadrature (see Integration; and specific 


forms of quadrature ) 


Raabe’s test, 170n. 

Radau quadrature, 406 

Random errors, 23 

Rank, 542 

Rational-function approximation, 498- 
518 

Reciprocal, iterative evaluation of, 627 
(48), 630 (65) 

Reciprocal derivatives, 509 

Reciprocal differences, 507 


Recursive computation, 28 
for continued fractions, 502 
for orthogonal polynomials, 340, 363 
Reduced cofactor, 321, 541 
Reduced penultimate remainder (RPR), 
612 
Reductive operator, 177 
Region of determination in difference 
table, 62 
Regula falsi, 572, 626 (42) 
accelerated, 574, 626 (44) 
for two simultaneous equations, 587 
Regular transformation, 206 
Relative error, 11 
Relaxation, 561 
cyclic, 565 
simultaneous, 565 
Relaxation table, 564 
Repeated midpoint rule, 96 
with correction terms, 202 
Repeated three-eighths rule, 122 (31) 
Residuals, 315, 550, 564 
Richardson extrapolation, 100, 292n., 
492, 494 
Riemann sum, 97, 121 (29) 
Rodrigues’ formula, 330 
Rolle’s theorem, 32 
Romberg integration, 102 
Root, iterative determination of, 627 
(47), 629 (64) 
Root-mean-square (RMS) value, 25 
Root squaring, 602 
Rounding, 10 
Roundoff errors, 5 
RPR method, 612 
Runge-Kutta methods, 285, 290 


Scaling, 20 
Secant method, 573, 626 (42), 631 (69) 
order of, 578 
Second law of the mean, 33 
Series: 
asymptotic, 9 
Biirmann, 36 
convergence tests for, 40 (6, 7), 41 
(10), 170n. 


Series: 
inversion of, 43 (16) 
summation of (see Summation of 
series ) 
Taylor, 6, 34, 245 
Sheppard’s rules, 62, 163 (9) 
Shifting operator, 175 
Short-range stability, 263 
Significant figures, 10 
Simpson’s rule, 93 
with correction terms, 189 
error bounds for, 236 (52) 
Simultaneous displacement, 565 
Simultaneous equations: 
linear, 539 
nonlinear, 583 
Simultaneous relaxation, 565 
Smoothing formulas, 357 
Sonine polynomials, 340 
Spline approximation, 478-494 
minimal property of, 480 
natural, 481 
periodic, 482 
Spline-on-spline computation, 492 
Square root, iterative determination of, 
627 (47), 629 (64) 
Stability: 
asymptotic, 568 
short-range, 263 
Standard deviation, 24, 45 (26) 
estimated, 322 
Steepest descent, method of, 587 
Steffensen’s error test, 40 (6) 
Steffensen’s interpolation formula, 144, 
166 (19) 
coefficient tables, 160, 161 
Stirling numbers: 
of first kind, 182 
of second kind, 185 
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Successive substitutions, method of, 568, 
583 
Summation by parts, 348 
Summation of series, 197, 203, 346 
Symmetric matrix, 547, 643 
Synthetic division, 28, 588, 609 
with complex numbers, 633 (86), 637 
(117) 
with quadratic divisor, 609 


Taylor series, 6, 34, 245 
Theta operator, 177 
Thiele’s expansions, 510 
Throwback, 153, 161 
Transpose of matrix, 564 
Trapezoidal rule, 95 
with correction terms, 202 
Traub’s formula, 583 
Tridiagonal systems, 482, 559 
Crout’s method for, 559 
Trigonometric approximation, 447, 452, 
462 
Trigonometric interpolation, 115, 117 
(7) 
Truncation errors, 5 
Tschebycheff (see under Chebyshev) 


Undetermined coefficients, method of, 
282 
Uniform approximation, 4, 39, 475, 514 


Vandermonde’s determinant, 116 (5) 
Variance, 24 
estimated, 322 


Stirling’s approximation to factorial, 104, Weierstrass’ theorem, 4 
233 (39) Weighting coefficients, 86, 209, 385, 388 
Stirling’s interpolation formula, 139 Weighting functions, 107, 209, 315, 386 
coefficient tables, 159, 161 Weights in least squares, 315, 322 
St6rmer’s method, 275 Wilkinson’s equation, 597 
Subtabulation, 195, 227 (21, 22) Wilson’s system of equations, 549 
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